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This  dissertation  is  devoted  to  exploring  the  problem  of  inference  about  the 
baseline  hazard  function  of  Cox’s  regression  model,  especially  when  the  baseline 
hazard  function  is  assumed  to  be  monotonic. 

Assuming  monotonicity  of  the  baseline  hazard  function  will  improve  the  effi- 
ciency of  estimation  of  the  baseline  hazard  function  in  Cox’s  regression  model.  The 
isotonic  regression  method  is  applied  to  find  the  isotonic  estimator  of  the  baseline 
hazard  function.  The  maximum  likelihood  estimation  of  parameters  with  order 
restriction  is  closely  related  to  the  problem  of  isotonic  regression. 

The  test  for  the  monotonicity  of  the  baseline  hazard  function  is  discussed  for 
random  censoring  model.  The  strong  consistency  of  the  isotonic  estimator  of  the 
baseline  hazard  function  is  shown.  To  improve  the  maximum  likelihood  estimator 
of  the  baseline  hazard  function,  when  there  are  censored  observations,  we  consider 
an  alternative  using  the  concept  of  the  window.  The  asymptotic  distribution  of  the 
isotonic  window  estimator  of  the  baseline  hazard  function  is  obtained  for  fixed  time 
t. 


IX 


CHAPTER  1 
INTRODUCTION 

1.1  Review 

Recent  statistical  research  deals  extensively  with  the  methods  for  the  analysis 
of  survival  data  derived  from  laboratory  studies  of  animals  or  clinical  studies  of 
humans.  The  proportional  hazard  model  proposed  by  Cox  (1972)  is  an  important 
tool  for  analyzing  such  data.  We  intend  to  explore  the  problem  of  inference  about 
the  baseline  hazard  function  in  Cox’s  model.  Survival  data  are  different  from  the 
data  that  are  gathered  in  conventional  studies,  because  survival  data  often  include 
censoring  times,  which  precludes  exact  determination  of  the  key  dependent  variable, 
survival  time. 

In  medical  studies,  experimenters  frequently  face  censored  data  in  clinical  trials 
for  chronic  diseases.  Some  patients  may  withdraw  from  the  study;  others  may  die 
for  nonrelated  reasons.  Still  others  may  be  alive  at  last  contact.  For  lost  patients, 
survival  times  are  at  least  as  large  as  the  elapsed  time  of  entry  and  the  time  they 
withdraw  or  die  for  non-related  reasons.  For  patients  still  alive,  survival  times  are 
at  least  as  large  as  the  time  from  entry  to  the  time  of  the  end  of  the  study.  These 
observations,  either  withdrawal  or  failure  for  competing  reasons  during  the  study, 
are  defined  to  be  censored  observations.  Loosely  speaking,  a censored  observation 
contains  only  partial  information  about  the  random  variable  of  interest. 

In  survival  analysis,  we  usually  encounter  three  types  of  censoring.  Type  I censor- 
ing occurs  when  we  have  a fixed  censoring  time,  so  that  an  observation  is  uncensored 
only  if  failure  occurs  before  the  fixed  censoring  time. 
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Because  of  financial  constraints,  we  may  only  observe  first  r observations  out  of  n 
possible  observations.  In  other  words,  observation  ceases  after  the  rth  failure.  This 
type  of  censoring  is  defined  as  type  II  censoring. 

Both  type  I and  II  censorings  arise  frequently  in  engineering  sciences.  For  instance, 
we  use  a batch  of  electrical  bulbs  as  experimental  units.  We  turn  on  all  electrical 
bulbs  simultaneously,  when  we  start  an  experiment,  to  investigate  how  long  they  last 
on  the  average.  Since  it  may  take  an  extremely  long  time  for  some  bulbs  to  burn  out, 
we  usually  cannot  wait  until  all  light  bulbs  burn  out.  We  may  be  forced  to  stop  the 
experiment  at  a prespecified  fixed  time  or  the  time  when  a prespecified  fraction  of 
all  light  bulbs  have  burned  out.  The  first  case  is  classified  as  type  I censoring,  while 
the  second  case  is  an  example  of  type  II  censoring. 

The  most  general  type  of  censoring  is  random  censoring.  Random  censoring 
occurs,  unlike  type  I and  II  censoring,  when  censoring  times  of  individuals  are  treated 
as  random  variables  from  an  unknown  distribution. 

Random  censoring  arises  in  medical  applications  with  animal  studies  or  clinical 
trials.  In  clinical  trials,  each  patient  may  join  the  study  at  different  times.  Our 
concern  is  to  measure  how  long  they  survive,  while  we  treat  them  with  one  or  several 
therapies.  But  we  may  lose  patients  for  reasons  unrelated  to  the  factors  being  studied. 
For  example,  a cancer  patient  may  move  and  never  report  back  to  the  clinic  center, 
or  he  may  refuse  to  continue  receiving  designated  treatments  which  he  considers 
unsatisfactory.  Another  example  occurs  when  a cancer  patient  dies  in  a car  accident. 
The  cause  of  death  is  not  cancer,  but  an  accident  which  is  not  related  to  our  study 
objectives  . We  only  know  that  the  individual  survived  until  the  car  accident.  In  this 
dissertation,  we  shall  concentrate  on  random  censoring. 

With  random  censoring,  we  make  the  following  basic  assumption:  The  censoring 
mechanisms  are  ’’noninformative.”  In  other  words,  the  censoring  time  is  independent 
of  the  survival  time. 
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One  of  the  principal  problems  in  survival  analysis  is  that  of  developing  methods 
for  exploring  the  association  between  failure  times  and  explanatory  variables.  For 
example,  a clinical  study  is  designed  to  compare  several  treatment  programs  in  terms 
of  the  failure  times.  The  explanatory  variables  would  include  indicator  components 
for  treatment  as  well  as  other  prognostic  factors.  The  Cox  regression  model  is  a 
conventional  technique  for  investigating  the  relationship  between  survival  time  and 
covariates. 

The  hazard  or  failure  rate  function  is  conceptually  simple  and  is  a specialized  way 
of  representing  the  distribution  of  the  failure  times.  The  hazard  function  gives  the 
risk  of  failure  at  any  time  t,  given  that  the  individual  has  not  failed  prior  to  time  t. 
Cox  (1972)  suggests  the  following  model  which  presumes  that  covariates  affect  the 
hazard  function  multiplicatively.  Let  Z be  a row  vector  of  p covariates.  Then  the 
Cox  model  satisfies  that 

\(t;  z ) = A 0{t)  exp (z/3),  (1.1) 

where  A 0(i)  is  an  unspecified  function  of  time  and  /?  is  a p— dimensional  column  vector 
of  parameters.  This  model,  though  largely  nonparametric,  permits  the  estimation  of 
/3  and  leads  to  estimates  of  survival  functions  of  the  Kaplan  and  Meier  type  (1958), 
when  covariates  are  present  in  the  data. 

Since  the  ratio  of  the  hazard  functions  corresponding  to  any  two  different  z-values 
is  constant  over  t,  (1.1)  is  often  called  a proportional  hazard  model.  The  factor 
exp (z{3)  describes  the  instantaneous  risk  of  failure  for  an  individual  with  covariate 
2 relative  to  that  at  a standard  value  z = 0.  Since  A0(i)  gives  the  hazard  for  an 
individual  under  the  standard  condition  z = 0,  \o(t)  is  called  the  baseline  hazard 
function. 

One  of  the  attractive  features  of  the  model  (1.1)  is  that  the  nuisance  function  Ao (t) 
can  be  removed  completely  from  inferences  about  /?  (Cox,  1975).  Another  advantage 
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is  that  the  covariate  information  on  different  individuals  is  easily  incorporated  into 

(1.1). 

Several  assumptions  on  A0(t)  are  possible  in  the  analysis  of  the  model  (1.1).  The 
simplest  one  is  to  assume  Ao(t)  is  constant.  This  is  equivalent  to  assuming  an  under- 
lying exponential  distribution.  The  next  simplest  case  is  to  assume  that  a family  of 
hazard  functions  has  two  unknown  parameters.  The  Weibull  distribution  is  an  exam- 
ple of  a two  parameter  family  of  hazard  functions.  A weaker  assumption  is  that  A 0{t) 
is  arbitrary  but  monotonic  increasing  or  decreasing  in  t.  In  certain  situations,  it  is 
reasonable  to  expect  that  the  failure  rate  will  increase  monotonically  or,  at  least  over 
a certain  interval  of  time.  For  certain  electronic  components,  manufacturing  defects 
tend  to  cause  failure  early  in  life,  so  that  the  failure  rate  may  be  higher  during  the 
initial  period  of  age.  This  is  the  case  when  a decreasing  failure  rate  can  be  expected. 
In  many  physical  situations,  the  object  does  become  more  likely  to  fail  as  it  ages. 
Examples  of  these  are  moving  parts,  human  beings  past  youth  and  so  on.  In  such 
cases,  one  would  expect  an  increasing  failure  rate. 

A main  problem  of  considerable  interest  is  the  inference  about  the  regression 
parameters,  allowing  the  baseline  hazard  function  to  be  arbitrary.  The  conditional 
likelihood  approach  suggested  by  Cox(1972)  is  a pioneering  method  leading  to  infer- 
ence about  the  regression  parameter  f3. 

Cox  writes:  ” Suppose  then  A 0(t)  is  arbitrary.  No  information  can  be  contributed 
about  f3  by  the  time  intervals  in  which  no  failure  occurs  because  the  components 
A0(t)  might  conceivably  be  identically  zero  in  such  intervals.  We  therefore  argue 
conditionally  on  the  set  of  instants  at  which  failures  occur;  in  discrete  time,  we 
shall  condition  also  on  the  observed  multiplicities.  Once  we  require  a method  of 
analysis  holding  for  all  A0(£),  consideration  of  this  conditional  distribution  seems 
inevitable.”  He  treats  his  conditional  likelihood  as  an  ordinary  likelihood,  so  that 
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he  finds  maximum  likelihood  estimators  and  their  asymptotic  distribution  for  the 
regression  parameter  f3. 

The  method  of  marginal  likelihood  was  developed  by  Kalbfleish  and  Prentice 
(1973),  for  the  analysis  of  the  regression  parameters  in  model  (1.1).  The  order  statis- 
tics and  the  rank  statistics  of  observed  failure  times  are  the  focus  of  their  discussion. 
They  consider  the  group  G of  differentiable,  strictly  monotone  increasing  transforma- 
tions of  (0,  oo)  onto  (0,  oo).  They  argue  that  the  estimation  problem  for  the  regression 
parameters  based  on  rank  statistics  is  invariant  under  the  group  G of  transforma- 
tions on  the  failure  time  t.  The  group  G acts  transitively  on  the  order  statistics,  while 
leaving  the  rank  statistics  invariant.  Only  the  rank  statistics  can  carry  information 
about  the  regression  parameters  when  A0(t)  is  completely  unknown.  That  is,  the 
rank  statistics  are  marginally  sufficient  for  the  estimation  of  the  regression  param- 
eters. The  marginal  likelihood  of  the  regression  parameters  is  proportional  to  the 
probability  that  the  rank  vector  should  be  observed  from  the  marginal  distribution 
of  the  ranks.  For  censored  data,  the  marginal  likelihood  becomes  more  complicated  if 
the  number  of  ties  is  large,  but  the  computation  can  be  simplified  by  using  an  approx- 
imation suggested  by  Breslow  (1974).  For  uncensored  data,  the  marginal  likelihood 
is  identical  to  the  conditional  likelihood. 

The  partial  likelihood  approach  to  inferences  about  the  regression  parameters 
which  gives  essentially  equivalent  results  to  those  given  by  marginal  likelihood  is 
described  by  Cox  (1975).  The  partial  likelihood  is  useful  especially  when  it  is  ap- 
preciably simpler  than  the  full  likelihood,  as  for  example,  when  it  involves  only  the 
parameters  of  interest  and  no  nuisance  parameters.  A reduction  of  dimensionality, 
when  we  have  many  nuisance  parameters,  is  possible  by  using  partial  likelihood.  This 
approach  is  especially  can  be  fruitful  when  A 0(t)  is  assumed  to  be  an  unknown  ar- 
bitrary function,  to  be  treated  as  a nuisance  function.  Cox  (1975)  shows  that  the 
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marginal  likelihood  derived  by  Kalbfleish  and  Prentice  (1973)  is  equivalent  to  a partial 
likelihood. 


1.2  The  Problem 

All  of  the  arguments  above  are  considered  with  the  assumption  that  A0(f)  is 
completely  unspecified.  If  we  have  additional  information  on  A0(f),  for  example, 
monotonicity  or  constancy  of  Ao (i),  since  information  should  be  useful  for  inference 
of  the  regression  parameters  in  (1.1).  From  a practical  point  of  view,  we  can  have 
empirical  or  prior  information  about  the  hazard  rate  without  taking  into  account  the 
effect  of  covariates.  If  experimental  units  are  like  light  bulbs,  machine  tools,  or  car 
engines,  we  are  concerned  about  the  way  in  which  the  items  in  question  wear  out. 
It  is  reasonable  to  assume  that  the  failure  rate  of  aging  items  will  tend  to  increase, 
when  we  do  not  consider  the  effect  of  the  covariates. 

The  efficiency  of  inference  about  the  regression  parameters  under  various  assump- 
tions about  A0(f)  is  referred  to  as  a ’’major  outstanding  problem”  by  Cox  (1972). 
Many  researchers  have  attempted  to  answer  the  above  problem.  Meshalkin  and 
Kagan  (1972)  showed  that  knowledge  of  the  baseline  hazard  function  is  helpful  in 
reducing  the  asymptotic  variances  of  the  estimates  of  in  the  model  (1.1)  by  10  to 
20  %.  They  assume  that  the  baseline  hazard  function  has  an  exponential  form  of  a 
linear  function  of  t . Efron  (1977)  argues  that  if  the  class  of  nuisance  functions  is  large, 
then  the  inferences  about  the  regression  parameters  based  on  partial  or  marginal  like- 
lihood are  asymptotically  equal  to  those  based  on  all  the  data.  He  also  carries  out 
the  calculation  of  an  information  matrix  which  shows  that  Cox’s  partial  likelihood 
has  full  asymptotic  efficiency  under  mild  conditions.  Oakes  (1977)  also  deals  with 
the  same  problem  from  a different  point  of  view.  Efron  (1977)  and  Oakes  (1977)  use 
different  parametrizations  of  the  baseline  hazard  function.  In  Efron’s  formulation, 
the  baseline  hazard  function  may  depend  on  the  regression  parameters  as  well  as  the 
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nuisance  parameters.  The  baseline  hazard  function  is  assumed  to  be  either  known 
completely  or  known  up  to  a multiplicative  constant  by  Oakes.  Explicit  formulas  for 
the  asymptotic  variances  of  the  estimates  of  f3  are  derived  informally  and  compared. 
Oakes  also  concludes  that  the  amount  of  information  lost  through  a lack  of  knowledge 
of  the  baseline  hazard  function  in  any  specified  data  set  is  usually  small. 

Inferences  about  the  baseline  hazard  function  Ao(t)  are  also  an  important  part  of 
survival  analysis,  because  A 0(t)  reveals  the  survival  pattern.  Breslow  (1974)  suggests 
that  Ao (t)  be  approximated  by  a step  function  which  has  discontinuities  at  observed 
failure  times.  He  considers  the  joint  likelihood  function  of  Ao(t)  and  and  derives 
the  maximum  likelihood  estimators  of  A0(t)  and  /3.  Beyond  Breslow’s  paper,  there 
is,  however,  little  discussion  on  inferences  about  A0(f)  in  the  literature. 

For  the  one  population  problem,  i.e.,  for  (3  = 0 in  (1.1),  many  results  concerning 
inferences  about  hazard  functions  under  order  restrictions  are  available  in  literature. 
Grenander  (1956)  was  the  first  to  use  the  concept  of  the  greatest  convex  minorant  to 
estimate  the  failure  rate  under  the  assumption  that  the  failure  rate  is  monotonically 
increasing.  The  various  problems  of  inference  under  order  restrictions  are  discussed 
by  Barlow  et  al.  (1972)  and  Robertson  et  al.  (1988),  who  deal  with  a wide  class 
of  extremum  problems  whose  solutions  are  provided  by  isotonic  regression.  They 
discuss  the  problems  of  estimating  monotone  failure  functions  and  of  prove  strong 
consistency  of  the  isotonic  estimates  of  monotone  failure  functions.  They  examine 
tests  for  exponentiality  against  monotone  failure  rate  alternatives  in  situations  with 
type  II  censoring  and  random  censoring. 

We  intend  to  extend  the  results  suggested  by  Barlow  and  coworkers  (1972)  to 
estimate  the  baseline  hazard  function  under  the  assumption  that  it  is  increasing  in 
time  t.  and  to  test  whether  the  baseline  hazard  function  is  constant  or  increasing  in 
t. 
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1.3  Preview 

The  objective  of  this  dissertation  is  how  to  estimate  the  baseline  hazard  function, 
Ao(t),  in  the  Cox  regression  model,  under  the  assumption  that  Xo(t)  is  increasing  or 
decreasing  monotonically  and  examine  the  consistency  and  normality  of  its  estimator. 
We  are  also  concerned  about  the  test  for  the  monotonicity  of  the  baseline  hazard 
function. 

Chapter  2 contains  tests  for  constant  failure  rate  versus  increasing  failure  rate. 
We  develop  a procedure  for  Type  II  censoring  and  then  extend  it  to  random  censoring 
with  covariates. 

In  chapter  3,  we  focus  on  estimating  the  baseline  hazard  function  by  a step  func- 
tion. Isotonic  regression  is  adapted  to  find  the  estimators  by  solving  likelihood  equa- 
tions under  order  restrictions  on  parameters. 

In  chapter  4,  we  prove  the  strong  consistency  of  the  estimate  of  A 0(t)  using  the 
strong  consistency  of  estimator  for  (3  obtained  by  partial  likelihood. 

In  chapter  5,  we  deal  with  the  problem  of  improving  the  maximum  likelihood  es- 
timator of  A0(t)  by  considering  windows.  The  asymptotic  distribution  of  the  isotonic 
estimator  of  A 0(t)  is  found.  Finally,  the  optimal  size  of  the  window  is  derived  by 
minimizing  the  mean  square  error  of  its  estimator. 

Chapter  6 contains  the  results  of  simulation  study  which  show  the  superiority 
of  the  isotonic  estimator  over  the  maximum  likelihood  estimator  of  A 0(f)  under  the 
monotonicity  assumption. 


CHAPTER  2 

TEST  FOR  MONOTONICITY  OF  THE  BASELINE  HAZARD  FUNCTION 


2.1  Introduction 

When  it  is  known  a priori  that  the  baseline  hazard  function  is  an  increasing 
function,  that  information  can  be  used  to  find  a better  estimates  of  the  baseline 
hazard  function.  Further,  when  no  information  about  the  baseline  hazard  function  is 
available,  it  is  of  interest  to  check  whether  it  is  monotone  increase  or  not,  before  we 
estimate  the  baseline  hazard  function. 

In  this  chapter,  we  will  develop  methods  by  which  we  can  test  the  hypothesis  of 
constant  versus  increasing  baseline  hazard  function.  That  is, 

Ho  : A 0(t)  is  constant 

versus 


Hi  : Ao(f)  is  increasing. 


(2.1) 


The  cumulative  total  time  on  test  statistic  is  a fundamental  tool  used  to  develop  the 
proposed  test  procedure. 

The  problem  of  testing  the  hypotheses  (2.1)  without  the  covariates  (i.e.,  /?  = 0 in 
(1.1))  is  reviewed  in  section  2.  In  section  3,  we  extend  the  concept  of  total  time  to 
test  the  hypotheses  (2.1)  under  Cox’s  regression  model  with  random  censoring.  In 
section  4,  we  illustrate  a graphical  method  of  testing  the  hypotheses  (2.1). 
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2.2  Test  for  Isotonicitv  without  Covariates. 

We  consider  the  problem  of  testing  the  hypotheses  (2.1)  where  A0(t)  is  given  by 
Y3^y.  This  problem  is  equivalent  to  a test  of  the  hypothesis  that  F is  an  exponen- 
tial distribution  against  the  increasing  failure  rate  alternative,  since  under  the  null 
hypotheses  (2.1),  the  failure  time  T has  an  exponential  distribution  with  mean  j. 

Bickel  and  Doksum  (1969)  discuss  the  problem  of  testing  for  the  given  hypothesis 
(2.1)  when  a sample  of  complete  observations  is  available.  Their  test  is  based  on 
the  ranks  of  the  normalized  spacings  between  ordered  observations.  Their  results 
are  extended  by  Barlow  and  Doksum  (1972)  to  the  case  where  we  have  Type  II 
censoring.  The  total  time  on  test  statistic  is  considered  as  a key  to  develop  the  test 
for  the  hypotheses  (2.1). 

In  this  section,  we  summarize  the  results  in  Barlow  and  Doksum. 

Suppose  a study  continues  until  the  kth.  failure  time  occurs  and  at  that  time  all 
surviving  individuals  are  assumed  to  be  censored.  We  obtain  the  first  k ordered  ob- 
servations out  of  a sample  of  n individuals: 

K(i)  < Yu( 2)  < • • • < Yu(k),  (1  < k <n) 

Let  Dn:i  = (n  — i + 1)(K,(,-)  — Tu(i-i)),  i = 1,  •••,«,  be  the  normalized  sample  spacings, 
where  Yu( 0)  = 0.  It  is  well  known  that  when  the  failure  time  has  an  exponential 
distribution  with  parameter  A,  the  normalized  spacings  Dn:i  are  independent  and 
exponentially  distributed  random  variables  with  mean  j-. 

We  have  the  following  theorem  that  forms  the  basis  for  many  tests  for  exponen- 
tiality  versus  increasing  (or  decreasing)  failure  rate. 
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Theorem  2.2.1  If  F has  an  increasing  failure  rate,  then  the  Dn:i  = (n  — i + 1)(UU(,)  — 
Fu(i_ !))  i = 1,  • • • , n,  is  stochastically  decreasing  in  i (=  1, 2,  • • ■ , n)  for  fixed  n. 

Proof  of  Theorem  2.2.1  See  B reslow  et  al.(I972) 

Under  the  alternative  hypothesis  of  increasing  failure  rate,  Theorem  2.2.1  implies 
that 


Dn:l^T)n:2>  ■ 

St  St 


>Dn,r, 

st 


while  under  the  null  hypothesis  of  constant  failure  rate  the  above  theorem  implies  that 

Pr (Dn:i  < Dn:j)  = ^ i^j 

where  > implies  stochastic  ordering.  We  have  zero  slope  in  the  linear  regression  of 
the  Dn:n-i  on  the  value  i under  exponentiality  of  the  distribution  F.  Since  Dn,i  tends 
to  be  larger  than  Dn:j  for  i < j,  the  slope  in  the  linear  regression  is  positive  under 
the  assumption  of  increasing  failure  rate. 

When  the  slope  is  nonzero,  it  is,  as  usual,  sensitive  to  change  in  scales.  Therefore 
it  is  desirable  to  make  the  statistic  scale  invariant  by  dividing  the  slope  by  the  average 
of  Dn:i,  i.e.,  IE?*  Dn:i. 

The  slope  in  the  regression  of  the  Z)n;n_,  on  i is  linearly  related  to  the  sum  of  the 
areas  of  triangles  formed  by  (0, 0),  (i,  0),  and  (i,  Dn:n_t)  for  i = 0,  • • • , (n  - 1).  Let  us 
define 


Vn 


def 
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— — l)-^n:n-»+l 


We  can  rewrite  Vn  as 


Vn 


i n i — 1 

-EE 

n , , 

4 = 1 J = 1 


n— 1 n 


= -EE  Dn:n—i 


1+1 


3— 1 «=j+l 


1 n— 1 n—j 
3 = 1 4=1 


1 

n 


E E« 


k=n— 1 t=l 


i n— 1 A;  / i n 

= ;EED.:i/-E°- 


n 


fc= 1 4 = 1 


n 


4 = 1 


(2.2) 


The  term  Tn(y^(,))  = £)'•_ j Z)n;j  in  (2.2)  is  recognized  as  the  total  time  on  test 
statistic  up  to  the  zth  failure.  If  n individuals  are  placed  on  test,  when  testing  com- 
mences, then  n individuals  survive  up  to  time  K(i),  (n  — 1)  individuals  survive  through 
the  interval  [T^),  5^(2)),  etc.  In  general,  (n  — i + 1)  individuals  survive  through  the 
interval  [K(i-i),K(«))-  Hence 


Tn{Yu(i)) 


= Y.D~, 

j=i 


— nYu(  i)  + (n  — l)(y^(2)  — Yu(i))  + • • • + (n  — i + 1)(1^(,)  — Yu (,-!)) 
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is  interpreted  as  the  total  time  on  test  statistic,  which  is  the  sum  of  times  for  indi- 
viduals who  survived  up  to  the  ith  failure.  The  following  definition  will  be  needed  to 
develop  a test  for  the  hypotheses  given  in  (2.1). 

Definition.  Given  the  first  k (1  < k < n)  ordered  observations  out  of  n individuals, 

i k— 1 i / 1 k 

H d=r 

1 = 1 J=l  1=1 

Tn{Yu(j)) 

k Tn(Yu(k))  K ’ 

is  called  the  cumulative  total  time  on  test  statistic,  where  Dn:t-  = (n  — i + l)(VL(i)  ~ 
Fu(,_i))  for  i = 1,  ••  • ,n. 


Under  the  alternative  hypothesis  that  A (t)  is  increasing,  Theorem  2.2.1  shows  that 
Vk  tends  to  be  large  by  the  fact  that  Dn:j  is  stochastically  decreasing  in  j.  We  may 
not  accept  the  null  hypothesis  that  A (t)  is  constant  when  Vk  is  fairly  large.  Hence 
the  statistic  Vk  (1  < k < n)  provides  a good  tool  for  determining  whether  the  failure 
rate  is  indeed  constant  or  increasing. 

In  order  to  perform  the  test  based  on  Vk  we  need  to  find  its  null  distribution.  We 
shall  use  the  following  well-known  theorem  to  find  the  statistic  which  is  stochastically 
equivalent  to  Vk  and  whose  distribution  is  known. 


Theorem  2.2.2  If  the  failure  times  have  an  exponential  distribution  with  parameter 
then 


k- 1 


V^XU> 


(2.4) 


where  Uj  (j  = 1,  • • • , k — 1) 


are  independent  uniform  random  variables  on  (0,1). 
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To  prove  this  theorem  we  need  to  prove  the  following  lemmas. 

Lemma  2.2.1  Conditional  on  ^"=1  Dn:i  = m,  Dn: i,  • • • , Dn:n_i  have  a uniform  distri- 
bution over  the  area: 

n— 1 

di  > 0,  i = 1,  • • • , n — 1,  ^ dj  < m, 

i= 1 

under  the  null  hypothesis  that  Ao (t)  is  constant. 

Proof  of  Lemma  2.2.1  See  Appendix  A. 

Lemma  2.2.2  Let  Xi  — — ”»  z = 1 , - - - , n — 1.  Then  under  the  null  hypothesis 

that  A 0(t)  is  constant,  and  conditional  on  Dn,i  — m,  X\,  • ■ • , Xn-\  have  a uniform 
distribution  over  the  area: 

x,'  > 0 i — 1?  • • • ? n — 1 x\  + ■ • • + xn_i  < 1.  (2-5) 


Proof  of  Lemma  2.2.2  It  is  seen  that  Lemma  2.2.2  is  a consequence  of  Lemmma  2.2.1 
by  a scale  change. 

Proof  of  Theorem  2.2.2  This  theorem  is  stated  by  Barlow(1972) , but  no  proof  of  this 
theorem  is  given.  Hence,  we  have  provided  a proof  in  Appendix  B. 

Theorem  2.2.2  demonstrates  that  14  can  be  considered  as  sum  of  ( k — 1)  i.i.d. 
uniform  random  variables  on  (0,1).  Hence  it  is  possible  to  compute  Ckt(\~a)  such  that 


a 


= Pr[Reject  H0\H0  is  true] 

= Pr[14  > (7yti(1_a)|A(i)is  constant]. 
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Table  2.1.  Percentiles  Ck,\-a  of  the  Cumulative  Total  Time  on  Test  Statistic,  Vk 
under  Hq 


k-1 

1 — a 

0.900 

0.9500 

0.975 

0.990 

0.995 

2 

1.533 

1.684 

1.776 

1.859 

1.900 

3 

2.157 

2.331 

2.469 

2.609 

2.689 

4 

2.753 

2.953 

3.120 

3.300 

3.411 

5 

3.339 

3.565 

3.754 

3.963 

4.097 

6 

3.917 

4.166 

4.367 

4.610 

4.762 

7 

4.489 

4.759 

4.988 

5.244 

5.413 

8 

5.056 

5.346 

5.592 

5.869 

6.053 

9 

5.619 

5.927 

6.189 

6.487 

6.683 

10 

6.178 

6.504 

6.781 

7.097 

7.307 

11 

6.735 

7.077 

7.369 

7.702 

7.924 

12 

7.289 

7.647 

7.953 

8.302 

8.535 

k=  number  of  failures  observed  in  data. 


Table  2.2.  Percentiles  Ck,a  of  the  Cumulative  Total  Time  on  Test  Statistic,  Vk  under 


k-1 

a 

0.100 

0.0500 

0.125 

0.010 

0.005 

2 

0.447 

0.316 

0.224 

0.141 

0.100 

3 

0.843 

0.669 

0.531 

0.391 

0.311 

4 

1.247 

1.047 

0.880 

0.700 

0.589 

5 

1.661 

1.435 

1.246 

1.037 

0.903 

6 

2.083 

1.834 

1.633 

1.390 

1.238 

7 

2.511 

2.241 

2.012 

1.756 

1.587 

8 

2.944 

2.645 

2.408 

2.131 

1.947 

9 

3.381 

3.073 

2.811 

2.513 

2.317 

10 

3.822 

3.496 

3.219 

2.903 

2.693 

11 

4.265 

3.293 

3.631 

3.298 

3.076 

12 

4.711 

4.353 

4.047 

3.698 

3.465 

&=number  of  failures  observed  in  data. 
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From  Theorem  2.2.2,  it  follows  that  the  null  distribution  of  14  is  approximately 


converges  to  a N( 0, 1)  random  variable  under  the  null  hypothesis  that  A (t)  is  constant. 
To  perform  the  test  we  use  critical  numbers  Ck}( i_a)  which  are  tabulated  by  Barlow 
and  Proschan  (1968)  for  small  k (see  Table  2.1).  For  example,  if  1 - a = .90  we  look 
in  row  & — 1 = 4 and  64,0.90  = 2.753  in  Table  2.1  and  2.2  . If  the  observed  value  of 
Vk  greater  than  the  number  2.753,  it  is  concluded  that  the  true  distribution  has  an 
increasing  failure  rate  at  the  10%  significance  level.  If  the  observed  value  of  14  is  less 
than  1.247,  we  can  make  the  opposite  conclusion  such  that  the  true  distribution  has 
a decreasing  failure  rate  with  a 10%  significance  level. 

Example  (Barlow,  1972).  In  Table  2.3,  we  list  the  times  between  air-  conditioner 
failures  on  selected  aircraft.  After  roughly  2000  hours  of  service  the  planes  received 
major  overhauls:  the  failure  interval  containing  major  overhaul  is  omitted  from  the 
listing  since  the  length  of  that  failure  interval  may  have  been  affected  by  the  overhaul. 

We  wish  to  determine  if  the  intervals  between  failures  have  an  exponential  dis- 
tribution or  if  there  is  a wearout  trend  as  the  equipment  ages.  In  the  event  that 
there  is  a wearout  trend,  maintenance  should  be  scheduled  according  to  equipment 
age  rather  than  the  present  policy. 

The  Vk  associated  with  the  data  in  Table  2.3  are  given  in  Table  2.4.  Since  the 
sample  size  for  plane  7908  exceeds  the  range  of  Table  2.1,  we  can  use  the  fact  that 
for  large  k,  and  under  H0 


normal  with  mean  |( k — 1)  and  variance  — 1)  for  large  k.  By  standardizing  Vk, 
it  follows  that 


Z = {12(*-l)}i[(*-l)->K4-i] 
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Table  2.3.  Interval  Between  Failures  of  Air  Conditioning  Equipment  on  Jet  Aircraft 


I< 

aircraft 

7907 

7908 

7915 

7916 

8044 

1 

194 

413 

359 

50 

487 

2 

15 

14 

9 

254 

18 

3 

41 

58 

12 

5 

100 

4 

29 

37 

270 

283 

7 

5 

33 

100 

603 

35 

98 

6 

181 

65 

3 

12 

5 

7 

9 

104 

85 

8 

169 

2 

91 

9 

447 

436 

43 

10 

184 

230 

11 

36 

3 

12 

201 

130 

13 

118 

14 

34 

15 

31 

16 

18 

17 

18 

18 

67 

19 

57 

20 

62 

21 

7 

22 

22 

23 

34 

Major  overhaul  before  the  14th  observation 
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Table  2.4.  Statistics  and  Conclusion 


plane 

sample  size  k 

statistic  Vk 

conclusion 

7907 

6 

U6  = 2.243 

Exponential  at  the  10  % level 

7908 

23 

U23  = 8.829 
Z=-1.607 

Exponential  at  the  10  % level 

7915 

9 

U9  = 2.80 

Decreasing  failure  rate  at  the  10  % level 

7916 

6 

U6  = 1-67 

Exponential  at  the  10  % level 

8044 

12 

V12  = 4.22 

Exponential  at  the  10  % level 

is  approximately  normally  distributed  with  mean  0 and  variance  1. 
Using  the  fact  that 


Vk  -U\  + • • • + Uk-i 

st 


is  symmetric  about  i.e., 

Pr(U  - ~ > x)  = Pr(U 

we  can  obtain  the  lower  critical  numbers  for  Vk.  Those  numbers  are  given  in  Table 
2.2.  If  Vk  is  less  than  the  lower  critical  number,  we  conclude  that  the  data  are  from  a 
distribution  with  a decreasing  failure  rate.  For  plane  7915  we  obtain  V^=2.80  which 
is  less  than  2.944  in  Table  2.2.  Hence,  we  conclude  at  the  10%  significant  level  that 
the  failure  times  of  the  air-conditioner  of  plane  7915  have  a decreasing  failure  rate. 
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2.3  Test  for  Isotonicitv  with  Covariates. 

We  would  like  to  extend  the  method  introduced  in  Section  1 to  test  the  mono- 
tonicity of  the  baseline  hazard  function  in  Cox’s  regression  model.  Our  concern  is  to 
determine  whether  the  baseline  hazard  rate  is  constant  or  increasing,  regardless  of 
the  values  of  the  covariates.  We  assume  that  the  observations  are  subject  to  random 
censoring.  Barlow  and  Proschan  (1969)  deal  with  the  same  problem  for  samples  sub- 
ject to  random  censoring  when  no  covariates  are  available.  It  does  not  substantially 
complicate  matters  to  develop  the  method  to  test  the  hypothesis  about  the  baseline 
hazard  function  when  covariates  are  available. 

Suppose  n individuals  are  put  on  test  at  time  t = 0.  Among  n individuals,  we 
assume  that  k individuals  failed  and  the  remaining  (n  — k ) individuals  are  censored. 
Let  < Yu{2)  < • • • < Yu(k)  be  the  ordered  failure  times  with  corresponding  covari- 
ates Z(i),  Z( 2),  • • • , ■£(*.).  Suppose  that  m,-  individuals  with  covariates  Z(n),  ■ • • , Z(imi) 
are  censored  in  the  interval  [K(i),  K(.+i)),  for  i = 0,1,  •••,&,  where  K(o)  = 0 and 

Yu(k+\)  — oo. 

The  set  of  actual  survival  times  for  the  n individuals  can  be  characterized  by 

yu(i)  < yu(2)  < • • • < yu(k)  yu(i)  < yc(n)  ■ < yc(imx) 

where  yc(n)  • • • yc(im,)  are  the  failure  times  associated  with  individuals  censored  in  the 
interval  [Fu(l), yu(t+1)). 

Now  let  h(yu(i))  denote  the  conditional  probability  that  Yu^)  < V^(ii),  • • • , 
given  Yu(i)  = yu(i)^  = 0, 1,  • • • , k.  Then  ( see  Kalbfleisch  & Prentice,  1980) 


^(l/u(«))  Pr[Lu(t)  < FC(,1),  ' * ' j Lc(tm,)  |Lu(t)  — Vu(i)] 


Define 


n(Vu(i))  = exp(^£) 

j€R(yu(i)) 

k rrij 

= Y,ieMzU)P)  + £ exP(2(iO^)]  * = 0>  1*  • • • , k.  (2.7) 

j=i  1=1 

where  R(t)  is  the  risk  set  prior  to  t. 

Note  that  if  /?  = 0,  then  (?/„(,)  — yu(i-i))(^(yu(i-i)  — 1)  is  the  total  time  on  test 
between  the  (i  — l)st  and  the  ith  observed  failures, 

Theorem  2.3.1  Let 


Ui=  f ° n(t)\0(t)dt  i = l,---,k, 

where  n{t)  = YfieR(t)exP(zlfi)  and  ^u(o)  = 0.  Then  Ui,i  = 1 ,k  are  independently 
distributed  with  density  exp(— u). 

Proof  of  Theorem  2.3.1  Let 


So(t ) = f n(x)\0(x)dx. 
Jo 


S0(t)  is  well  defined  up  to  the  first  observed  failure,  yu(\),  since  n(t)  depends  only  upon 
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the  numbers  of  the  observed  censored  individuals  less  than  yu(i)  which  are  greater  than 
t.  By  definition 

Ui  = [ ()  n(x)\0(x)dx  = So(K(i))- 

J o 

Now  to  show  that  U\  has  density  exp(— ufi),  we  compute 


Pv(U1>Ul) 


Pr(S0(rtt(1))  > «i) 

Pr(n(i)  > So'M) 

k / TTlj  ri— 1 j 

exP[  - ( exP(z(i)P)  + X] exP (zjlP))  I ° A0(a:)cfo] 

j= i v ;=i  ' Jo 

. fSo'(u  i)  , 

exp[  — / n(x)\0(x)dx] 

Jo 

exp[-S’0(5o1(“i))] 

exp(-uj), 


using  (2.6).  Next  we  will  show  U2  is  independent  of  U\  and  also  exponentially  dis- 
tributed with  mean  1.  Let 


, fYu(  2)  rt 

U2  = I n(t)\0(t)dt  and  SXl(t)  = / n(x)X0(x)dx. 
JY*i)  Jx  1 


Then  the  conditional  probability  that  the  U2  is  greater  than  u2  given  the  first  failure 
occurs  at  xi  is 


Pr (U2  > u2)|K(i)  = x^  = Pr(5ri(Fu(2))  > u2|Fu(1)  = xa) 


= Pr(K(2)  > = Xi) 


= exP[-*5x1(5'Xl1(u2))] 
= exp[-u2]. 
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Thus  U2  is  independent  of  U\  and  also  exponejitially  distributed  with  mean  1. 
Continuing  in  this  manner  by  conditioning  on  previous  events,  we  prove  that  £/,  for 
i = 1,  • • • , k,  are  independent  and  distributed  exponentially  with  mean  1. 

Since  we  wish  to  test  the  null  hypothesis  that  the  baseline  hazard  function  is 
constant,  we  put  A 0{t)  = A.  Then  by  a simple  transformation,  we  note  that 


Ui  = 


i = 1 , • • • , k 


are  independent  exponentially  distributed  with  mean 
Theorem  2.3.2  Let  us  define  14  by 


yk  = Z-Zl  fo  •*"  n(u)du 
fP"  n(u)du 

Shi  Vj 

st  ^ TT. 

i=l  L^,i- 1 

where  Ux ’s  are  i.i.d.  random  variables  with  exponential  distributions  with  parameter  1. 

Then  under  Hq,  14  is  distributed  as  the  sum  of  ( k — 1)  independent  uniform 
random  variables  over  (0,  1)  when  /3  in  n(u ) is  known. 

Proof  of  Theorem  2.3.2  The  proof  follows  immediately  from  Theorem  2.2.2. 

The  next  theorem  implies  that  Vk  is  reasonable  test  statistic  for  the  given  hypoth- 


esis. 
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Theorem  2.3.3  If  \0(u)  is  increasing  in  u > 0 and  n(u)  > 0 for  u > 0,  then 


k-l  y>i 

Vk  > £ 

st  y'k 

i=i  ^j= l 


Uf 

Uj 


where  U\,  U2,  • • • , Uk  are  independently  distributed  as  exponential  random  variables 
with  mean  1. 

Proof  of  Theorem  2.3.3  Since  n(u ) > 0 and  Aq (u)  is  increasing  in  u,  we  have 


J X0(u)n(u)du  < Aq (t)  J n(u)di 


LI-  1 • fn  ^0Mn(lt)du  . 

which  implies  " Q j<  n{u)du — ts  increasing  in  t > 0.  Hence  for  i = 1,  • • • , k we  have 

fo  “fl)  A 0(u)n(u)du  < A0(u)rc(u)du 
fou(,)  n(u)du  ~ f0K(k>  n(u)du 

which  is  equivalent  to 

fo  “(l)  X0(u)n(u)du  < /0y“(,)  n(u)du 

fo  u(k)  X0(u)n(u)du  f^u(k>  n(u)du 

i.e., 

fo  "<t>  ^0 (u)n(u)du  Jo'"'1”  n(u)du 

Since  by  Theorem  2.3.1, 


g Ej=i  Uj  d Z-=ifou(i)  Aq (u)n(u)du 
t'=i  Ej=i  Uj  f0K(k)  X0(u)n(u)du 


the  proof  is  complete. 
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We  reject  the  null  hypothesis  that  A 0(t)  is  constant  if  the  value  of  14  exceeds  the 
critical  number  given  in  Table  2.1.  In  practice,  we  must  replace  0 by  the  maximum 
likelihood  estimator  of  0 from  marginal  likelihood,  to  obtain  an  asymptotically  valid 
the  test. 


Example.  Consider  a data  set  generated  by  using  the  following  proportional  haz- 
ard model, 

A o{t',  z)  = 2texp(2z), 

where  z is  given  under  Ho,  covariate  values.  (See  Section  6.2.)  Suppose  we  have  two 
groups  of  patients,  e.g.,  male  and  female.  We  are  interested  in  testing  whether  or 
not  the  hazard  rate  is  increasing,  regardless  of  sex.  The  data  set  generated  is  shown 
below. 


Male  : 0.0248  0.0361* 

0.0452* 

0.0821 

0.125* 

0.1489* 

0.1596  0.2008* 

0.2017 

0.2144* 

0.2352* 

0.2469* 

0.2749  0.3017 

0.4907  0.6531 

0.3045* 

0.3164* 

0.3189 

0.3797* 

Female:  0.0511  0.0611 

0.0625 

0.0766 

0.0768 

0.2216 

0.2294  0.2604 

0.2984 

0.3149 

0.3662 

0.5187 

0.7587  0.7906 

0.8859 

0.9317 

0.9442 

1.0004 

1.0810  1.5018 

* indicates  censored  observations. 


We  would  like  to  see  if  we  obtain  the  same  result  using  0 = 1.932  as  the  result 
using  0 = 2.0.  When  0 = 2.0,  14  = 17.79,  while  when  0 = 1.932,  14  = 17.70. 
We  obtain  z=2.48  by  standardizing  14.  Hence,  we  make  the  same  conclusion  that 
the  baseline  hazard  rate  is  increasing  at  the  5%  significance  level.  Therefore,  this 
example  supports  the  validity  of  replacement  of  /3  with  0 obtained  from  the  marginal 
likelihood  to  perform  the  test. 
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2.4  A graphical  Method  to  Test  Isotonicitv  with  Covariates 

We  intend  to  develop  a graphical  procedure  which  allows  us  to  visually  examine 
whether  or  not  the  baseline  hazard  function  is  constant. 

When  the  baseline  hazard  is  constant,  the  corresponding  failure  time  distribution 
, Fq,  is  an  exponential  distribution.  Epstein(1960)  introduced  a graphical  procedure 
for  checking  to  see  if  the  underlying  distribution  is  really  exponential.  He  plots 
y = log (1/(1  — F(t))  against  t where  the  cumulative  distribution  F(t)  is 


0 t < 0 

1 — exp(  — |)  t > 0 


assuming  that  9 > 0. 

If  the  failure  rate  is  increasing,  it  is  not  difficult  to  see  that  — log(l  — F(t))  is 
convex  on  (0,  oo).  To  extend  the  idea  suggested  by  Epstein  to  test  for  the  baseline 
hazard  rate  in  Cox’s  regression  model,  we  estimate  the  survival  function  of  failure 
time  T,  given  Z — z. 

Turning  to  our  problem,  note  that  survival  function  of  the  failure  time  T given 
Z = z,  is  given  as 


S(t]  z)  = So(t)exp{z0) 

where  So(t)  is  an  arbitrary  survival  function.  First  we  consider  the  calculation  of  the 
nonparametric  maximum  likelihood  estimate  of  So(t)  (Kalbfleish  Sz  Prentice,  1980). 
The  probability  that  an  individual  with  covariate  z fails  at  yu^)  is 


So(yu(i)TMz^l3)  - So(jM0  + 0)exp(2w/J) 


where  S0(yu(i)  + 0)  is  a right  limit  of  yu(i)-  We  assume  that  the  contribution  to  the 
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likelihood  of  a survival  time  censored  at  t is 

S0{t  + 0)exp(2/3). 

In  effect,  the  observed  censoring  time,  t , tells  us  only  that  the  observed  failure  time 
is  greater  than  t.  Thus  we  obtain  the  likelihood  function 

k 771, 

c = n[{S»(».(„r*»',)  - S>( »,(,•)  + 0)“"<'I')«)}  J|  So(yc(, (2.8) 

« = 0 j=  1 


It  is  clear  that 

So(t)  = S0(yu(i)) 

f°r  yu(i)  < t < yu(,+i),  in  order  not  to  make  C — 0.  In  other  words,  the  solution  is  a 
discrete  distribution. 

Let  1 — ai  be  the  hazard  rate  at  ?/„(,),  i.e., 

1 - <*•  = Pr [T  = yu(i)\T  > yu(i))  i = 1,  • • • , k. 


Then  we  have 


Pr {T  > yu(i))  = S0(yu(i)) 

— So(yu(i-i)  + 0) 

«-i 

= n«;  * = i,  — , * (2.9) 

j= 0 

where  a0  = 1.  Note  that  the  LHS  of  (2.9)  is  equivalent  to 

fVu(i) 

exp  [—  / Ao(u)du] 

Jo 
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by  definition  of  survival  function  Sq(u).  That  is,  we  have 


fVu(i)  L_i 

exp[—  / A0(u)du]  = JJ  otj 
Jo  j= o 


i = 1, • • • , k 


Taking  logarithm  of  both  sides,  we  obtain 


- — 1 rvu(i) 

~ Yu  log  aJ  = / Xo(u)di 
,=o  Jo 


When  A0(u)  is  constant,  we  can  see  that  Ej=o  logaj  is  a linear  function  of  ?/„(,),  while 
it  is  a convex  function  of  yu (,)  when  the  A0  is  increasing. 

Since  the  a/s  are  unknown,  as  an  asymptotic  approximation  we  set  otj  = dj, 
the  maximum  likelihood  estimators  of  the  a/s,  j = 1,  • • • .*  — 1.  In  order  to  obtain 
the  maximum  likelihood  estimators  of  the  a/s,  the  likelihood  function  (2.8)  can  be 
rewritten  as 


k i-1  e*p(z(i)/3)  ,•  exP(z(i)fl)  m,  j exp(z(il)/3) 

= nm«i)  -(n«i)  indigo  i 

7=1  j= 0 j= 0 1=1  j=0 

k i-1  eMz(i)0)  mi  i exp  (z(u)(3) 

;=i  j= o j=o 


k k i {exp(z(i)/?)+^\  exp(z(i{)0)}  k 


7=1 


7 = 1 J=0 


7 = 1 


k k ^ k 

= n (1  - a,exp(^,)/3))  exp(*(0^)+Etei  Q,.exp(z(l)/3) 

*=i  i=i  t=i 

k k ^ m 

- JJ(1  _ aiexp(z(0/3))  JJ  Xi=/exP(*(i)0)+Ei=i  exp(z(i|)/?)}-exp(z(j)/3)j 

* = 1 J = 1 


28 


= nu 


on 


*p(*(i)0))a 


£**«(.•))  ** 


p(*(i)0)-exP(z(O'3) 


(2.10) 


by  using  (2.9).  We  now  replace  f3  with  f3  which  is  estimated  from  its  marginal 
likelihood  and  then  maximize  (2.10)  with  respect  to  ax,  ■ • • , a*.  Differentiating  the 
logarithm  of  (2.10)  with  respect  to  a,-,  we  obtain  the  normal  equation, 


- exp (z(l-)/?)6t- 


A exp(z(t)/3) 

1 — a, ■ 


+ H exp(z(/)/3)  - exp(z(i)/3)  = 0. 

*(y„(0) 


(2.11) 


We  can  obtain  the  maximum  likelihood  estimate  of  a,-  as  a solution  to  (2.11),  i.e., 


di 


(1- 


exp(z(,)/3) 
£/€*( yu(0)  exp(2;/9) 


exp  (-Z(i)P). 

) 


Note  that  an  iteration  method  is  required  to  obtain  the  maximum  likelihood  estima- 
tors of  ot{  when  we  have  multiple  individuals  falling  at  yyy 

In  practice,  we  simply  plot  - logaj  against  yu(t),  i = 1,  • • • , k,  to  apply  the 
graphical  method.  If  the  baseline  hazard  function  is  constant,  then  the  plot  should 
be  roughly  linear  while  if  the  baseline  hazard  rate  is  monotone  increasing,  then  the 
plot  will  tend  to  be  a convex  curve. 


CHAPTER  3 

ESTIMATION  OF  THE  BASELINE  HAZARD  FUNCTION 


3.1  Notation. 


In  the  remainder  of  this  dissertation  we  assume  that  the  baseline  hazard  function, 


mates  the  baseline  hazard  function  by  a step  function  with  discontinuities  at  each 
observed  failure  time.  However  the  maximum  likelihood  estimates  of  the  baseline 
hazard  function  are  inefficient,  since  they  do  not  take  into  account  the  monotonicity 
of  A Q(t). 

Let  n be  the  number  of  individuals  in  the  sample.  We  shall  define  a random 
variable,  representing  failure  time  as  T where  observed  failure  time  is  t.  Let  Z 
be  a row  vector  of  s measured  covariates.  Let  Tu---,Tn  and  Cu---,Cn  be  the 
independent  random  variables  of  failure  times  and  censoring  times,  respectively.  We 
observe  (Fi,  ^i),  • • • , (Yn,  6n)  where 


with  corresponding  covariates  Zu  - ,Zn.  Define  Fu(t)  to  be  the  ith  order  statistic 
from  uncensored  failure  times  for  i = 1 ,...,&  ( k < n ).  Let  Zu^  represent  the 
corresponding  covariate  of  Fu(,) . 

As  defined  in  (1.1),  the  proportional  hazard  model  is  given  by 


A0(£),  in  Cox’s  regression  model  increases  monotonically.  Breslow(1974)  approxi- 


Yi = Ti  A Ci,  Si  = I(Tt  < Ci ) 


A (t;z)  = \0(t)exp(z(3). 


The  survival  function  is  defined  by 
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and  the  density  function  by 

f(t;z)  = \(t;z)S(t]z) 

given  that  Z = z. 

Breslow(1974)  obtains  the  maximum  likelihood  estimator  of  A0(£)  without  the  as- 
sumption of  increasing  failure  rate.  In  this  chapter  we  derive  the  maximum  likelihood 
estimator  of  Xo(t)  under  the  assumption  of  increasing  failure  rate.  We  approximate 
the  baseline  hazard  function  by  a nondecreasing  step  function  with  discontinuities  at 
each  observed  failure  time;  that  is, 


Ao(0  A,'  J/u(i— 1)  t ^ Vu(i) 


(3.1) 


where  A,-  < A,+1  for  i = 1 Ao  = 0,  yu(o)  — 0,  and  yu(k+i)  — oo.  For  estima- 

tion purposes,  we  assume  that  an  individual  censored  in  the  interval  [?/«(,•  _i),  yu(i))  is 
censored  at  yu(i-i)-  This  approach  is  similar  to  Breslow(1974). 

The  likelihood  corresponding  to  the  observations  described  at  the  beginning  of 
this  section  is 

k y ( ) 

~ II[^o‘(?/u(«'))  exp(su(,)^)  exp  {—  j A 0(u)du  exp(2//3)}] 

where  dt  is  the  number  of  individuals  who  failed  at  and  su(t)  is  the  sum  of 
covariates  of  individuals  failing  at  yu{i).  H(yu({))  is  the  set  of  labels  attached  to  the 
individuals  who  either  fail  or  are  censored  observations  at  yu(i).  particular  way,  for 
example, 

Using  (3.1),  Co  reduces  to 

k 

= n^'  exP(5u(,)^)exp{-A,(?/u(,)  - yu(,-_i))  exp {zt/3)}] 

,=1  ^fi(j/u(l)) 
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where  R(yu (,))  is  the  risk  set  prior  to  yu(iy 

To  maximize  C\,  we  consider  the  logarithm  of  Ci,  denoted  by  £2, 


k 

£2  = log  + su(i)P  - \{yu(i)  - yu(i- 1))  exP(zifl)}-  (3.2) 

,=1 


In  the  remainder  of  this  chapter,  the  term  "maximum  likelihood  estimator”  of 
A0(t)  will  be  referred  to  the  solution  A,-,  i = 1,  • • • , k,  which  maximizes  £2  subject  to 
Ai  < A2  < • • • < Afc.  In  next  section,  we  apply  the  isotonic  regression  method  to 
find  the  maximum  likelihood  estimator  of  Ao(f).  In  the  following  chapter,  we  shall 
replace  (3,  by  the  the  marginal  likelihood  estimator  of  (3  to  find  maximum  likelihood 
estimator  of  Ao(£). 


3.2  Isotonic  Regression 

We  shall  introduce  isotonic  regression  through  an  example.  The  importance  of 
isotonic  regression  can  be  illustrated  by  the  classical  two  sample  problem,  in  which 
the  mean  response  for  two  treatments  are  compared.  If  treatment  1 is  a standard  crop 
treatment,  while  treatment  2 is  an  experimental  treatment  (standard  plus  fertilizer), 
then  it  may  be  possible  to  assert  that  y2  > //j , where  /q  is  the  population  mean 
amount  of  produce  per  acre  with  applying  treatment  i (i  = 1,2).  Suppose  we  want 
to  test  the  hypothesis  Hi  = /i2.  Since  significance  comes  from  an  observed  difference, 
say  yi  — 2/2,  we  perform  a one-sided  test.  If  we  ignore  the  ordering  assumption,  then 
we  would  use  the  standard  two-sample  t test  for  the  equality  of  two  treatments.  It  is 
well  known  that  the  one-sided  test  is  significantly  more  powerful  than  the  two  sided 
test.  Since  the  two-sided  test  makes  no  use  of  the  prior  information  that  < y,2. 
By  taking  such  information  into  account,  the  one-sided  test  gives  higher  power  to 
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test  the  difference  between  the  treatments.  Isotonic  regression  analysis  provides  a 
method  of  using  the  ordering  assumption  about  the  parameters. 

We  need  the  following  definitions  to  introduce  the  isotonic  regression. 

Let  X be  the  finite  set  {xj,  22,  • • • , x*,}  with  the  simple  order  X\  < 22  < • • • < xk 
and  w be  a positive  weight  function.  A function  /,  on  X,  is  isotonic  if  /(x  1)  < 
f(x 2)  < • • < f(xk) 

Suppose  g is  a given  function  on  X.  A function  g*  on  X is  an  isotonic  regression 
of  g with  weights  u>  if  and  only  if  g*  is  isotonic  and  g*  minimizes 

(3.3) 

xex 


in  the  class  of  all  isotonic  functions  / on  X. 

Robertson  et  al.  (1988)  in  the  following  theorem  argues  that  for  a finite  set  of  X , 
’isotonic’  estimators  reduce  error  in  the  sense  by  the  following  theorem. 


Theorem  3.2.1  Suppose  we  have  a quasi-order  on  a finite  set  of  X . If  0 is  any  func- 
tion on  X and  if  6*  is  the  isotonic  estimator  of  6 with  weights  u>,  then 

E *[**(*)  - *(*)M*)  < E *000  - 0(x)]u(x) 

x€X  xex 


for  any  convex  function  'L  on  (—00,00)  and  any  isotonic  function  6 on  X. 

Proof  of  Theorem  3.2.1  See  Robertson  et  al.  (1988,  p41). 

Theorem  3.2.1  states  that  the  isotonic  estimator  of  A0  reduces  error  in  a number 
of  ways  as  seen  by  taking  VE'(f)  = |t|p,  p > 1.  For  example,  with  p = 1,  we  can  see 
that  A*  has  less  total  absolute  error  and  with  p = 00,  Aq  has  less  maximum  absolute 


error. 
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' ‘ I "s  ‘‘onsider  a graphical  interpretation  of  (3.3). 

In  P Wilt  3. 1 . (g  — I ) can  he  interpreted  as  the  excess  of  the  rise  in  the  graph  of 
i lie  Innct  ion  'J'(n)  = u2  from  / to  g over  the  rise  of  its  tangent  line  at  /.  Clearly,  we  see 

(l  = {fJ  ~ f ]1  = (J2  ~ If'  + f)2f]  = ^>(g)  - ${/)  - (g  - /)0(/) 

" lu  u ^ ‘s  ‘*l<1  ('er'vative  ot  ty(tt)  at  /.  This  excess  is  nonnegative  for  every  convex 

lunci  ion  T( ,/).  whet  her  / < g or  g < /.  It  is  strictly  positive  if  <P(u)  is  strictly  convex 
and  / </. 

Let  us  generalize  the  square  error  measure  (3.3),  replacing  tf(ti)  = u 2 by  any 
convex  function.  Let  tf(u)  he  a convex  function,  which  is  finite  on  an  open  interval  I 
containing  the  range  of  function  g and  infinite  elsewhere.  Denote  the  discrepancy  of 
W-  g)  hy 


-M<7(.rL/(;i:))  = j ~ (9  ~ f)i/>(f)  f(x),g{x)  £ I 

l 00  otherwise,  ' 

"•here  r(f)  is  the  derivative  of  #(«)  at  /.  If  *(«)  does  not  have  a derivative  at  /, 

I hen  !;’(./ ) denotes  any  number  between  the  left  and  right  derivative  at  /.  From  (3.4), 
it  can  he  seen  that 


Aij/(r,  t)  _ Ai t(r,s)  + Aq($,t)  + (r  — s)[V’(5)  — 


if  r.s  and  t are  in  the  domain  of 

JJl<onw  .1.2.2  Let  T be  a convex  function  which  is  finite  in  an  open  interval  I,  con- 
taining the  range  of  the  function  g and  infinite  elsewhere.  If  eg*  is  the  isotonic  regres- 
sio„  of  g.  f is  isotonic  on  X,  and  the  range  off  is  contained  in  I,  then 
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Figure  3.1.  Graphical  interpretation  of  the  square  error  measure  of  discrepancy 
The  graph  of  <f>(u)  = u 2 is  drawn. 
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12  ^i9(x),  f(x)]^(x)  > X!  + Y1  z^<pb*(x)>/(-c)]u;(;r)- 


Consequently  g*  minimizes 

J2^[9(x)J(x)]u(x) 

X 

in  the  class  of  isotonic  f with  range  in  I and  maximizes 

£{*[/(*)]  + b(»)  - /MM/MIM*). 


(3.5) 


The  maximizing  (minimizing)  function  is  unique  if  'P(x)  is  strictly  convex. 

Proof  of  Theorem  3.2.2  See  Barlow  et  al.(1972). 

Corollary  3.2.1  Let  if  i,  if2,  • • • , ifv  be  arbitrary  real  valued  functions  and  let  hi,  h2,  • • ■ , hm 
be  isotonic  functions  on  X.  Then  g * minimizes 

12  A*(g,f)w(x) 

X 

in  the  class  of  isotonic  functions  f with  range  in  I satisfying  any  or  all  of  the  side 
conditions 


ICtofa)  - /(*)]^;[/(*)M*)  = 0 j = l,2,---,p  (3.6) 


J2f(x)hj(x)uj(x)  > J2g(x)hj(xp(x) 


j = 1,2,  •••  ,m 
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Theorem  3.2.2  and  Corollary  3.2.1  can  be  used  to  show  that  the  isotonic  regression 
provides  a solution  for  a wide  variety  of  estimation  problems  with  order  restriction  in 
which  the  objective  function  is  other  than  least  squares.  The  effort  for  solving  these 
problems  is  focused  on  finding  the  appropriate  choice  of  the  function  'L(x). 

Example.  (Barlow  et  ah,  1972)  Suppose  that  for  each  of  various  levels  x of  a stimulus 
(e.g.,  dose  of  insecticide)  the  probability  of  a response  (e.g.,  death  of  the  insect  ) is 
p(x).  We  would  like  to  estimate  /r(x),  known  to  be  nondecreasing  in  x.  If  X is  the 
finite  set  of  stimulus  levels,  it  is  simply  ordered  by  the  dosage  levels.  Suppose  for 
x € X,  there  are  m(x ) independent  trials  at  stimulus  level  x,  a(x)  responses  occur, 
and  y(x)  = a(x)/m(x)  is  the  average  number  of  responses  per  trial. 

If  m(x)  is  large  for  x,  the  ratios  y(x)  can  be  expected  to  be  in  increasing  order 
that  are  natural  estimates  of  the  probability  y(x).  But  if  some  consecutive  ratios  y(x) 
have  reversed  ordering,  another  estimator  would  be  required.  The  isotonic  regression 
of  y with  weights  m(x)  is  an  obvious  candidate. 

Let  b(x)  = m(x)  — a(x)  denote  the  number  of  nonresponses  among  m(x)  trials  at 
stimulus  level  x.  If  /(x)  denotes  an  arbitrary  function  bounded  between  0 and  1 on 
X,  the  likelihood  at  / of  the  sample  is 


and  the  negative  log-likelihood  can  be  written  as 


(3.7) 
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Thus  the  solution  of  the  problem  of  maximum  likelihood  estimation  of  fx  is  the 
function  that  minimizes  (3.7)  over  the  class  of  isotonic  functions  on  X. 

Consider  the  following  convex  function. 

^(u)  = u log  u + (1  — u)  log(l  — n),  o < u < 1 

#(0)  = 0,  (3.8) 

tf(l)  = 0. 

Then, 

A(/,sO  = glogg  + (1  - flf)log(l  - g)  - glogf  + (1  - g)  log(l  - /).  (3.9) 

Noting  that  the  first  two  terms  on  the  right  of  (3.9)  do  not  involve  /,  the  problem  of 
finding  maximum  likelihood  estimator  of  /x(x)  is  equivalent  to  finding  / which  min- 
imizes the  discrepancy  determined  by  convex  function  (3.8).  Theorem  3.2.2  states 
that  ym(x),  the  isotonic  regression  of  y(x)  minimizes 

£ { y(x)  log  y(x)  + [1  - y(x)]  log[l  - y{x)\ 
xex  "■ 

-y(x)  log  f(x)  + [1  - y(x)]  log[l  - /(x)]|m(x), 
or  Exex  A(y(x),  f(x))m(x). 

Recall  that  our  problem  is  to  maximize  (3.2) 

k 

^2  = X]0°g  + Su(i)P  - A i(j/u(«)  - Vu(i- 1))  ^2  exp(.z//3)},  (3.10) 

1=1  »€«(»u(j)) 

subject  to  Ai  < A2  < • • • < \k,  assuming  that  all  failures  are  distinct,  i.e.,  d,  = 1.  To 
obtain  the  maximum  likelihood  estimator  of  A,-,  i = !,•••,  k , by  applying  the  previous 
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theorem  and  corollary,  we  define  \H(u)  = ulogu.  Then  we  obtain 

= 9 l°g  9 ~ 9 log  f ~ (9  ~ f)- 

Since  g log  g does  not  depend  on  /,  Theorem  3.2.2  implies  that  the  isotonic  regression 
g*  of  g maximizes 

^[5,(x)log/(x)  +5(3:)  - f(x)]uj(x),  (3.11) 

X 

in  the  class  of  positive  isotonic  functions.  By  Corollary  3.2.1,  with  ip  = 1,  g*  also 
maximizes 

XM*)  l°g  f(x)  ~ f(x)]u(x),  (3.12) 

X 

in  the  class  of  positive  isotonic  functions  / satisfying 

X^(x)  - f(x)]u(x)  = 0.  (3.13) 


Since  su^  is  independent  of  A,-,  the  problem  of  (3.10)  is  equivalent  to  maximizing 


k 

J2  

«=i  M2/u(.)  -J/ix(,-i))E/eH(yu(t))exp(zi/3) 


log  A,-  - A,J 


(y«(<)  - Vu(i-i))  X exp (zi/3)  (3.14) 

ieR(»u(o) 


subject  to  Ai  < A2  < • • • < Afc. 

LetX  = {l,2,...,t},/(i)  = Ai, 
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and 


g(i)  - - 

{yu(i)  - Vu(i- 1))  E/gfl(Vu(0)  exp (zi/3) 

w(0  = (yu(i)  - yu(i- 1))  X exp (z/Z?) 

^R(yu(i)) 


i — 1,  ■ • • , k.  by  substitution,  we  can  see  that  the  expression  (3.12)  is  same  as  (3.14). 
It  can  be  seen  that  the  solution  A1}  A2,  • • • , At,  that  maximizes  (3.14)  satisfies 


1 . 

My«(0  - Vu(i-i))  E/efi(s/„(i))  exp(2;/?) 


(yu(«)  - yu(.-i))  X!  exp(z//?)  = 0.  (3.15) 


If  Ai  < A2  < • • • < Ajt,  and  p > 0,  then  we  have  pAa  < pA2  < • • • < pA*,.  It  is  easily 
seen  that 


£ ((».(.■)  - JMi-o)  exp(*/3) ‘0gA  A) 

(y«(«)  — y«c«-i))  x exp (z,/3)  (3.16) 

achieves  its  maximum  as  a function  of  p at  p = 1.  Substituting  1 for  p in  (3.16)  yields 
(3.14).  On  setting  its  derivative  at  p = 1 to  zero,  we  obtain  (3.15),  which  implies  the 
the  solution  A,  A2,  • • • , A*,,  satisfies  the  condition  (3.13). 

We  can  derive  that  the  isotonic  regression  g * is  the  maximum  likelihood  estimator 
of  A,,  i = 1,  •••,&,  where  A0(£)  = A,-  for  yu(i-\)  < t < yu (,•),  since  g*  maximizes 
(3.12),  subject  to  (3.13)  and  hence  it  also  maximizes  (3.14)  subject  to  (3.15)  and 
Ai  < A2  < • • • < A*,. 

Now  we  consider  the  graphical  interpretation  of  the  isotonic  regression  g*.  Assum- 
ing the  ordering  X\  < x2  < ■ • ■ < Xk,  plot  the  points  P j = (Wj,  Gj),  in  the  Cartesian 
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plane  where 


and 


i 

Gi  = 

1=1 

wi  = Y1  u(xi)-  for  j = 1,  • • • , k 

t=l 


Let  P0  = (0,0).  These  points  form  the  cumulative  sum  diagram  (CSD)  of  the 
given  function  g with  weight  ui.  The  slope  of  the  chord  joining  Pt_x  to  Pj  (i  < j ) 
represents  the  weighted  average 

i j 

Av{xi,xi+i,---  ,Xj}  = ^2g(xT)u(xr)/  ^2uj(xr). 

r=i  r=i 

It  is  clear  to  see  that  g(xj),  j = 1,  • • • , k,  is  the  slope  of  segment  joining  P7_i  to  Pj. 

It  is  well  known  that  the  greatest  convex  minorant  (GCM)  of  the  CSD  is  the  graph 
of  the  supremum  of  all  convex  functions  whose  graphs  lie  below  the  CSD.  Let  us  next 
consider  the  graphical  method  for  GCM.  First  draw  a line  for  which  the  entire  CSD 
lies  on  or  above  it.  If  it  intersects  in  more  than  one  point,  then  the  segment  joining 
its  leftmost  and  rightmost  intersections  becomes  a part  of  the  graph  of  GCM.  The 
GCM  is  made  up  of  such  segments.  Graphically  the  GCM  is  the  path  along  which 
a taut  string  lies  if  it  joins  P0  and  P^  and  is  constrained  to  lie  below  the  CSD.  The 
value  of  the  isotonic  regression  g*  at  a point  xj  is  just  the  slope  of  the  GCM  at  the 
points  P*  with  abscissa 

]Cw(z.)- 

t=i 


Table  3.1  above  and  Figure  3.2  below  clarify  the  concepts. 
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Table  3.1.  Example  of  CSD  and  GCM 


j 

u(xi) 

Wi 

9(xj) 

Gj 

9* 

1 

1 

1 

-2 

-2 

-2 

-2 

2 

2 

3 

5/2 

3 

-8/5 

1/5 

3 

3 

6 

-4/3 

-1 

-1 

1/5 

4 

2 

8 

1 

1 

1 

1 

Remark:  Wj  = ELi  u(xj)  Gj  = ELi  9{xj)u(xj)  Gj  = ELi  9m(xj)u{xj)  j = 1, 2, 3, 4. 


Figure  3.2.  Example  of  CSD  and  GCM 
Slope  at  P of  CSD:  Gj  - Gj-t/Wj  - Wj.x  = g(Xj); 

Slope  at  P*  of  GCM:  G)  - G*_JW:  - Wj.  1 = = 1,2, 3, 4 
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By  Utilizing  the  fact  that  the  isotonic  regression  can  be  represented  graphically 
as  the  slope  of  the  GCM  (Barlow  et  ah,  1972),  we  obtain  the  formula  of  the  isotonic 
regression  of  g, 


g*(xi)  — max  min /lids,  t) 

s<i  t>i 


where 


Av(s,t)  = Y,9(xr)u{xr)  / j^w(xr). 


Since  we  have 


9 = 


1 

(y«(.)  - yu(i- 1))  EieR(yu(i))  exP(zl0) 


and 


u = (y«w  - Vn(i-i))  exp(^)> 

l€R(yu(i)) 

we  obtain  the  isotonic  regression,  which  is  equivalent  to  the  maximum  likelihood  es- 
timate of  A,-, 


A?  = max  mm 

s<i 


<-3  + 1 


4>*'  El(yn(j)  - Vu(j-l))  E/€fl(yu(>))  exp (z,/3) ' 


The  function  g and  u>  are  dependent  upon  the  unknown  regression  parameters,  /3, 
which  must  be  replaced  by  a constant  estimator.  Typically,  we  use  the  value  (3  which 
is  obtained  from  the  marginal  likelihood  by  a Newton-Raphson  iteration. 


CHAPTER  4 

CONSISTENCY  OF  THE  ISOTONIC  ESTIMATOR 
4.1  Notation  and  Assumption 

In  this  chapter,  we  shall  prove  the  strong  consistency  of  the  isotonic  regression 
estimator  of  the  baseline  hazard  rate,  for  x0  6 (yu(»),  J/u(.+i)],  * = 0, 1,  • • • , k. 

\/\  • t — «s  1 

A,(x0j  = max  mm — — (4.1) 

*-  ^ ' J2s{yu(j)  - VuU- 1))  E/6/?(yu(j))  exp (Z10) 

where  R is  the  estimator  of  the  regression  parameter  by  marginal  likelihood.  For 
simplicity,  we  shall  assume  all  failures  are  distint  and  that  the  covariate  z is  single 
valued. 

We  shall  use  the  notation  and  formulas  as  defined  by  Tsiatis  (1981). 

Let  the  covariate  Z be  a random  variable  with  density  y(z)  and  distribution 
Q(z).  We  assume  that  Q(z)  has  compact  support,  i.e.,  there  exists  z0  such  that 
Pr(0  < Z < Z0)  = 1.  The  Cox  regression  model  links  the  distribution  of  the  failure 
time  to  the  covariates  Z.  We  assume  that  Tq  is  the  time  when  the  study  ends.  So 
Pr(T  < T0)  = 1.  Let  fi(t\z)  denote  the  hazard  function  of  the  censored  distributions, 
given  Z = z.  It  follows  that  the  conditional  probability  of  surviving  until  time  t 
without  being  censored,  given  that  Z = z,  is  given  by 


H(t\z)  = Pr(r>f|z) 

= exP[~  / {A0(x)exp(z^)  + n(x\z)}dx\. 
«/  0 


(4.2) 
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The  conditional  probability  of  surviving  until  t without  being  censored  and  eventu- 
ally dying  before  being  censored,  given  Z = z,  is 


F{t\z) 


Pr(T  >t,8  = \\z) 

rTo 

/ {A0(x)  exp(z(3)H(x\z)}dx, 


(4.3) 


Furthermore  the  probability  of  surviving  until  time  t without  being  censored  and 
eventually  dying  before  being  censored  is 


m 


Pv{T>t,6  = l) 
J F(t\z)q(z)dz 


J J Aq(x)  exp(zf3)H(x\z)dxq(z)dz. 


(4.4) 


We  assume  that  F(t)  is  continuously  differentiable  and  has  an  inverse  function.  The 
derivative  of  F(t)  is 


dF(t) 

dt 


- J A0(f)  exp(z/3)H(t\z)dQ(z) 
-A 0(t)  J exp(zfi)H(t\z)dQ(z). 


(4.5) 


For  g(z),  a continuous  function  of  2,  define 


E(g(z),t)  = E[g(Z)I[T>t]] 
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= E[g(Z)P(T>t\Z)] 

= J g(z)H(t\z)dQ(z ) (4.6) 


and 


Ei(g(z),t)  = E[g(Z)I[T>t,s=1]] 

= E[g(Z)P(T>t,6=l\Z )] 


= E[g(Z)F(t\Z)] 


= J J g(z)\0(x)exp(z/3)H(x\z)dxdQ(z). 


By  differentiation  of  Ei(g(z),t)  with  respect  to  t , we  obtain 


d_ 

dt 


Ei[g(z),t}  = -A 0(t)  J g(z)exp(z/3)H(t\z)dQ(z). 


Using  (4.5)  and  (4.6),  we  further  obtain 


Ao(0 


/ E(exp(zl3)J.). 


(4.7) 


We  can  define  the  usual  empirical  estimates  of  F(x)  and  E(exp(z/3),  x)  by 


1 ” 

Fn(x)  = - I[Ti>x,Si=l] 
n i= i 


and 
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En(exp(z(3),x) 


-7  Yl  exp(^rt/?) 

ieR(x) 


L 


{z:T>x} 


exp  (z/3)dQn(z) 


(4.8) 


respectively  where  Qn  is  the  empirical  distribution  of  Z.  Further  we  define  that 


En(exp(z/3),x ) = ~X]exp  (ziP)I[Ti>x] 


n 


i=l 


and 


K^(x)  = I E(exp(zf3),u)du. 


4.2  Consistency  of  Isotonic  Estimator  of  A n(t) 

The  objective  of  this  subsection  is  to  show,  for  fixed  (r0,  Ai(x0)  is  a consistent 
estimator  of  A0(x0)  where 


Ai(x0) 


t — s + 1 


max  min  — — 

Es(yu(j)  - yu(j-i))EieR(yuU))exp(z^) 


(4.9) 


for  x0  G (yu(i),yu(i+i)\-  We  can  rewrite  (4.1)  for  x0  G {yu(i),yu(i+i)], 


A i(x0) 


max  min 

s<i  t>i 


Fn(yu(s-1))  Fn{yu(t)) 
Kdyu(t))  ~ Kt.(yu(s- 1)) 


(4.10) 


where 
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Kdx)  — / En(exp(z/3),u)du 

J C 


for  yu(1)  < x < yu(n)  and  ytt(1)  < £ < yu(n). 

Next  we  shall  prove,  for  any  fixed  x0,  such  that  0 < F(x0)  < 1, 

A(xo_)  < liminf  A,-(x0)  < lim sup  A,-(x0)  < A(x0+),  (4.11) 

n— ' >0°  n— t-oo 

where  x0  E (yu(i),  2/u(t+i)]  and  i depends  on  the  sample. 

First  we  must  prove  in  the  following  lemma  and  corollary 

sup  \K^(x)  — AT(x)|  — > 0 as  n —*  oo.  (4.12) 

0<£<T0 

The  following  lemma  implies  (4.12). 

Lemma  4-2.1  Let  ft  be  the  maximum  likelihood  estimator  of  (3  obtained  from  marginal 
likelihood.  It  follows  that 


sup  \En(exp(zj3),  tt)  - E(exp(z/3),  u)|  ->  0 

0<«<T0 


(4.13) 


where  Pr(T  < To)  = 1. 


Proof  of  Lemma  d.2.1  Note  that  from  the  triangle  inequality 


sup  | En(exp(z/3),u)  - E(exp(z/3),u)  I 

0 <u<T0 

= SUP  \En(exp(z/3),u)  - £n(exp(z/3),u)  + En(exp(z /3) , u)  - E(exp(z/3),u)\ 

0 <u<T0 

< sup  {\En(exp(z/3),u)  - En(exp(z(3),u)\ 

0<u<To 

+ ]EJfxp(zi)),  u)  - £(exp(z/3),u)|}. 


(4.14) 
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Therefore  it  suffices  to  show  that 


sup  \En(exp(zj}),u)  - En(exp(z/3),u)\  — ► 0 a.s.  (4.15) 

0<u<2o 


and 


sup  \En(exp(z/3),u)  - E(exp(z(3),u)\  ^ 0 a.s.  (4.16) 


In  terms  of  integral  forms  of  En(exp(zf3),  u)  and  En(exp(zf3),u),  (4-15)  can  be 
rewritten  as 


sup 

0<u<Tq 


< 

< 


J(z-T>u}  ^)dQ,(z)  - f T>u}  exp(^)iQn(,)| 
SUP„,  I i _ (exp^)  - exp(zffi)dQn(z)\ 

0<u<Tq  J {z:T>u} 

sup^  Is  ^ JexP(^)  - exp(z/3)\dQn(z) 

0<tx<T0  J{z:T>u} 

[ I exp (2/3)  - exp(^/?)|^gn(2), 


(4.17) 


where  Z0  and  0 are  assumed  to  the  upper  bound  and  the  low  bound  of  the  random 
variable  z respectively.  Since  (3  converges  almost  surely  to  (3,  |exp(2r/?)  — exp(z/?)| 
is  bounded  and  converges  to  0 almost  surely.  Moreover  dQn  is  a bounded  measure. 
Hence,  (4-15)  converges  almost  surely  to  0. 

Tsiatis  (1981,  Appendix  1)  shows  that  the  Glivenko-Cantelli  lemma  (Chow  & Te- 
icher,  1978,  p 260)  can  be  applied  to  prove  that  (4-16)  converges  to  0 almost  surely, 
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since  En(exp(zf3),  u)  and  E(exp(zf3),  u)  are  bounded  by  assumption  and  nondecreas- 
ing in  u.  This  completes  the  proof  of  Lemma  4-2.1. 


Corollary  4. 2.1  Under  the  conditions  of  Lemma  4-2.1, 


sup  \K^(x)  — K((x)\  — ► 0 as  n —>  oo. 

0<£<T0 


Theorem  4-2.1  For  A,'  defined  by  (4-10)  and  for  every  fixed  x0  with  0 < F(x0)  < 1, 


X(x0  ) < liminf  \i(x0)  < limsup  A,(x0)  < A(x0+). 

n~*°°  TI—.OC 


(4.18) 


Proof  of  Theorem  4.2.1  To  prove  the  first  part  of  (4-18),  let  £ be  an  arbitrary  point 
such  that 


F *(1)  < £ < xQ. 

converging  to  £ as  It  follows  from  (4-10)  that 


A,(x0)  = inf  sup 

*o<2/u(t)  j/u(j)<xo  Kz(yu(t))  - ATs(yu(4_i)) 


inf 

XQ<X 


Fn(Q  - Fn(x) 

Kdx)-mY 


> 


(4.19) 
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Since  Fn  converges  to  F , Corollary  4-2.1  implies 


lim  inf  A,(x0) 

n—*oo  v 7 


> 


> 


inf  «>.-?*> 

x0<x  F^[x) 

F(Q  - F(x) 

S<x0<x  F({)  - F{X) 
Ao(0 


Ao(0 


(4.20) 


where  the  second  inequality  follows  from  (a)  the  monotonicity  of  A0  and  F and  (b) 


K^x)  = f E(exp(zf3),u)du 


< ^y(F(«-F(x))  (4.21) 


5mce  £ is  an  arbitrary  number  which  is  less  than  xq,  we  have 


lim  inf  A,(x0)  > A0(x0  ) 


(4.22) 


To  prove  the  second  part  of  (f.18),  let  £ be  an  arbitrary  point  such  that 


x0  < £ < F (0). 


We  have  from  (f.10), 
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A,-(*o)  = inf  sup  ^ Fn(yu(t)) 

xo<yu(t)  yu(a)<x0  K((yu(t))  - Kf(yu(s_!)) 


< sup  4^ 

x<x0  /Q(£)  — K^(x) 


(4.23) 


Since  Fn  converges  to  F as  n -*  oo,  Lemma  4-2.1  implies 


limsupAi(xo)  < 

n—*oo 


sup 

x<r0 


F{*)  - m 
~Kdx) 


< 


F(x)  - F(Q 
r<7«  F(x)  - F(0 


HO 


= HO 


(4.24) 


where  the  second  inequality  follows  from  (a)  the  monotonicity  of  A0  and  F and  (b) 


—K^(x)  = I — E(exp(zf3),  u)du 
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Since  £ is  an  arbitrary  number  which  is  greater  than  xq,  we  have 


limsup  Ai(x0)  < A0(ar0+).  (4.25) 

n— *-oo 


By  combining  (4-22)  and  (4-25),  it  follows  that  the  isotonic  estimator  of  the  baseline 
hazard  function  is  strongly  consistent,  provided  that  A0(a:)  is  continuous: 


lim  At-(x0)  = A0(ar0) 


a.  s. 


where  A,(a:o)  is  defined  in  (4- 7). 


CHAPTER  5 

THE  ISOTONIC  ESTIMATOR  BASED  ON  THE  WINDOW 

5.1  Introduction 

In  Chapter  3,  we  considered  estimators  of  the  baseline  hazard  rate  based  on 
order  statistics.  In  practical  situations,  the  observed  data  are  frequently  grouped 
into  intervals.  In  other  words,  we  are  able  to  count  only  the  number  of  failures  and 
number  of  censored  observations  within  specified  intervals.  Hence  it  is  important  to 
consider  estimators  corresponding  to  grouped  data  rather  than  maximum  likelihood 
estimator  of  the  baseline  hazard  rate  based  on  order  statistics.  A general  class  of 
isotonized  fixed  and  random  ” window  ’ estimators  of  failure  rate  are  proposed  and 
discussed  by  Barlow  and  van  Zwet  (1969).  They  showed  that  the  window  estimators 
with  appropriate  window  size  have  higher  asymptotic  efficiency  than  the  unrestricted 
maximum  likelihood  estimator.  Intuitively  it  seems  reasonable  that  an  improved 
estimator  can  be  obtained  by  forcing  the  estimator  to  be  monotone.  In  our  case, 
we  might  expect  that  the  estimator  based  on  the  window  has  the  property  that  it 
is  closer  to  Ao(f)  than  the  maximum  likelihood  estimator,  by  the  criterion  of  mean 
square  error.  We  are  also  interested  in  determining  an  appropriate  size  of  window  to 
optimize  the  maximum  likelihood  estimator. 

Note  that  the  maximum  likelihood  estimator  of  Ao (t)  based  on  the  ordered  sample 
{y«(i)  < Vu(2)  < ■ < yu(k )}  is 

A°(x)  = — — j-  i = 

{Vu(i)  ~ Vu(i- 1))  Eiefl(Su(0)  exp (z,p) 

for  Vu(i-i)  < x < yu{i). 
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For  each  n,  let  us  define  a grid  on  (0,  oo)  over  a finite  or  an  infinite  sequence 
0 = tn, o < f n,i  • • • < tn<i  < • • • . In  each  window  [fnj,  fnj+1),  a point  xnj , denoted  by 
xj  for  simplicity,  is  chosen. 

If  Uu( i)  < tn,i  < x < tn,i+i  < yu(k),  A0(x)  is  estimated  by 


\*(x)  — Fn{tn,i)  Fn(tnij+ 1) 

(tn,i+ 1 - t„,i)^n(exp(2/3,a:) 

where  Fn  and  En(exp(z/3),x)  are  defined  in  Section  4.1. 

We  next  obtain 


(5.1) 


A/(x) 


s— 1 

min  max 

s>t+l  r<i  T-* 

~ J=T 


for  tU'i  < x < 1 where  the  with  weights  w(xj)  are  given  by 


Ld(xj)  ( tn,j+l  tn,j)En(exp(zf3),  Xj). 


A i(x ) is  called  the  ’’isotonic  estimator  of  Aq(x)  with  respect  to  the  discrete  measure 

UJ. 

With  Aq(x)  for  the  initial  estimator  and  weights 


UJ{Xj)  — (tn,j+ 1 tn,j)En(eXp(z/3),  Xj) 


we  obtain  the  isotonic  estimator  of  A0(x)  for  tn<i  < x < tn,i+i  by 


A’l(x)  - min  max 

r<i  1 


Fn(tn,r)  ~ Fn(tn<a) 


*-*’+1  r-‘  E*=;(^nj+1  - tnj)En{exp(z$),Xj) 
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5.2  Asymptotic  Distribution  of  the  Basic  Estimators. 

We  will  assume  that  the  size  of  the  window  is  related  to  the  sample  size,  as  tnti+ 1 — 
tn,i  — cn  Q for  c > 0 and  0 < a < 1,  in  order  to  derive  asymptotic  approximations. 
Although  in  practice,  we  may  be  interested  in  simultaneous  estimation  of  Ao(x)  at  all 
points,  we  shall  concentrate  on  the  asymptotic  behavior  of  Ag(x)  where  x is  considered 
fixed.  For  mathematical  convenience,  it  will  be  assumed  that 

^n,i+ 1 "F  tn,i 

X ; = 

2 

that  is,  Xi  is  the  midpoint  of  a grid  spacing.  In  this  chapter,  the  main  objective  is  to 
find  the  asymptotic  distribution  of 


AJ(x)  = min  max g 

s>,+l  r<t  £^J(tnJ+1  -tnj)En(exp(z^),Xj) 


(5.2) 


where  /3  is  a maximum  likelihood  estimator  of  /3  from  marginal  likelihood. 

Before  considering  the  asymptotic  properties  of  AJ(x),  we  must  first  derive  the 
asymptotic  distribution  of  Aq(x)  in  (5.1). 

For  given  time  x,  we  define  the  empirical  distribution  Qn(z)  of  Z by 


From  the  definitions  of  pages  46—48,  Chapter  4,  and  the  above  definition,  we  have 
the  following. 

E(exp(z/3),x)  = J exp(z/3)H(x\z)dQ(z) 

1 n 

En(exp(z/3),  x)  = -J2eMziP)I[T,>x] 

n i=i 

= J exp (z/3)dQn(z) 
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and  similarly 


1 A 


En(exp(z(3),  x)  = -J2eMziP)I[T,>x] 


n 


t=i 


= J exp  (zf3)dQn(z) 


In  order  to  apply  the  method  developed  by  von  Misses  (1964)  to  derive  the  asymp- 
totic results  of  Aq(x),  the  following  definition  is  needed. 


H (e,v)  = [ j exp[z(e0  + (l  - e)/3)] 

[vdQn(z)  + (l-Ti)H(x\z)dQ(z)]} 


-l 


(5.3) 


Substituting  e — 77  — 1 and  e = 77  = 0 in  (5.3)  respectively,  we  note  that 


H(  1,1)  = ^ 

En(exp(z/3),x) 


(5.4) 


and 


m 0,0)  = 


(5.5) 


E(exp(z/3),x)' 

A Taylor  expansion  of  H(e,Tj)  at  (0,0)  is  utilized  to  obtain  the  form  which  approxi- 
mates Aq(x)  (Serfling  1980,  chapter  6). 

Putting  e = rj  = 1 in  a Taylor  expansion  of  H(-,  •)  yields 


H(l,  1)  = H{  0, 0)  + H[{  0, 0)  + H'v(  0, 0)  + h.o.t 


(5.6) 


57 


where  ”h.o.t”  means  higher  order  terms.  Next,  we  consider  the  terms  of  #'(0,0)  and 

^(0,0); 


^(0,0) 


0 ~ P)  / zexp(z/3)H(x\z)dQ(z) 
{/ exp(z/3)H(x\z)dQ(z)}2 

0 - (3)E(zexp(z/3),x) 
E2(exp(z(3),  x) 


H'(  0,0) 


f exp(z/3){dQn(z)  - H(x\z)Q(z)} 
{/ exp(z/3)H(x\z)dQ(z)}2 


_ _ En(exp(z/3),x)  - E(exp(z/3),  x) 
E2(exp(z/3),x) 

Hence,  from  (5.4)  to  (5.8),  we  have 


(5.7) 


(5.8) 


1 = 1 (h  qsE{zexp(z(d),x) 

En(exp(zf3),x ) E(exp(z(3),x)  1 E2(exp(z(3),x) 


En(exp(z/3),x)  - E(exp(z(3),  x) 

E2(exp(z/3),  x)  +h-°-t 


(5.9) 


From  (5.1),  we  can  rewrite  Aq(x)  as 


Wx)  = ^4*n,,-)  ~ En(tnti+1)  f ♦ E(zexp(zf3),x) 

(tn,i+ 1 — tn,i)E(exp(z[3),  x)  l E(exp(z/3),  x) 


En(exp(zp),x)  - E(exp(z0),x) 
E(exp(z/3),x) 


+ h.o.t  >. 


(5.10) 


58 


It  is  straightforward  to  show  that  the  high  order  term  is  of  order  Op(n  1),  since 
(P  ~ P)  = Op{n  2 ) (Tsiatis,  1981)  and  ( En  — E)  = Op(n~ ) by  the  strong  law  of  large 
numbers  (Chow  & Teicher,  1978).  We  shall  show  in  Lemma  5.2.2  that  the  second  and 
third  terms  of  (5.10)  are  asymptotically  negligible,  so  that  we  only  need  to  consider 
the  asymptotic  properties  of  the  first  term  of  (5.10).  Let  us  define 

Y En{tnj)  ~ -fn(^n,t+l)  . . 

-tn,i)E(exp(z(3),x).  ' ' 


Let  us  next  consider  the  mean  of  Yn.  Since  E[Fn(x )]  = F(x ) for  fixed  x , 
E[Yn]  = E 


En(tn,i ) ^n(^n,i+l) 

■ ~ tnii)E(exp(z(3),  x) 

ntn,r)  ~ F(tn,l+1) 

(<n,«+ 1 - tnti)E(exp(zP),x)' 


(5.12) 


Utdization  of  the  Taylor  expansion  of  F(tnii+1)  at  x where  x = -n'1+*rv+1 , yields 
= n*)  + (*»,,+■  - x)F«(l)  + - x)2fW(l) 

+ Jf(Wl  - X)3F|3,(X)  + (y((n,1  + , - X)4F«>(X-) 

where  x * is  between  x and  *n>,+1.  Similarly  we  obtain 

= F(x)  + ((„,  - x)F<‘>(x)  + - xfF <!)(x) 

+ - x)3e(3)(x)  + - x)4F <4>(x") 
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where  x **  is  between  tn<t  and  x.  Hence  F(tn^)  - F(tnii+1 ) reduces  to 

F{tn,i)  - F(tnii+1 ) = -F(1\x)cn~a  + ^-c3n~3a  F{3\x)  + qn~4a  (5.13) 


assuming  |F(4)(-)|  < oo  where  q is  some  constant.  Hence,  from  (5.11)  and  (5.12)  we 
obtain  the  asymptotic  mean  of  Yn  for  fixed  x, 


E 


Fn(tn,i)  Fn(tn'j+ j) 

. cn~aE(exp(z/3),  x) . 


E(exp(z/3),x)  24  E(exp(z/3),  x)  ( ’ 


(5.14) 


Note  that  from  (4.7),  the  first  term  in  the  right  hand  side  of  (5.13)  is  \0(x). 

Next,  we  find  the  asymptotic  variance  of  Yn , using  the  variance  formula  for  a 
binomial  random  variable. 


Var[F„]  = Var 


Fn(tn,i)  Fn(tnj+ 1) 


• (*n, i+i  - tn'i)E(exp(zfl),x). 


■{F(tn.i)  ~ F(tn<t+1)} 


c2n  2aE2(exp(z/3),x)n 

{1  - (F(tn,{)  - F(tn,i+1))} 


(5.15) 


which  is  simplified,  using  (5.12),  to  the  following; 


Var[K]  = 


c2n  2a+1E2(exp(z/3),x) 


{[-^(*n,t)  F ( * 7i , t l )]  — [-^(*77,0  — i^(*n,»+i)]2} 


c2n  2a+1E2(exp(zl3),  x) 


(-cn~aF^(x)  + Lc2n~2aF^(x)  + Bn~3a ) 
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= ~F{1\x)  -i+, 

cE2(exp(z/3),  x) 


+ 0(0 


(5.16) 


where  B is  a positive  constant. 

The  asymptotic  distribution  of  Yn  is  presented  in  the  next  lemma. 


Lemma  5.2.1  Let  us  assume  that  A0(:r)  is  continuous  and  differentiable  and  that 
F(3\x)  exists.  //<„,, +i  - tn<i  = cn~a  for  c < 0 and  0 < a < 1,  then 


-FM(x) 


]*£(exp(z/3),z)n(12a)(Vn  - A0(x)  - 


c2n~2a 


FW(  x) 


24E(exp(z/3),  x) 


has  an  asymptotic  standard  normal  distribution  for  \ < a < 

J 7 — 5 

If  | < a < 1,  then 


[-^y]^(exp(^),x)n(12a)(rn  - A°(x)) 


has  an  asymptotic  standard  normal  distribution. 
Proof  of  Lemma  5.2.1  Let  us  define 

for  j = 1,  • • • , n.  It  follow  that 


1 

1 J = 1 

def  y 
— 


F n(tn,i)  ~ Fn(tnti+i) 
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Noting  that  Xj  s are  i.i.d.  Bernoulli  random  variables  with  parameter  F{tn^i)  — 
F(tn,i+ 1),  It  follows  that  n(Fn(tn<i)  ~ Fn{tn, «+i))  is  distributed  as  a binomial  random 
variable  with  parameters  n and  F(tnii)  — F(tn<i+ 1).  To  prove  the  lemma,  we  shall  prove 
that 


g def  cn  aE(ex p(z0),x)  cn~a  E(exp(z0),x) 

fWtnA)-F{t„,i  + 1))(l-Fttn,i)+F(tn,i+1)) 
y c2nl~2aE2(exp(z@),x) 


has  an  asymptotic  standard  normal  distribution.  To  prove  the  asymptotic  normality 
of  Zn , we  show  that  the  moment  generating  function  of  Zn  converges  to  the  moment 
generating  function  of  the  standard  normal  distribution. 


MZn(t)  = E[exp{tZn)\ 

Xn F(tn,i)~F(tn.U1) 

— £1[c;:p(t  cn~aF:(exP(z0)}x)  cn-<*E(exp(z0),x) 


F(tn,t)  F(tnj+i  )(l-F(tn:l)+F(tn,i+i )) 


0] 


E2(exp(z0),x) 


- E[exp(< 


I Vn  ( 

n *->3=1  V cn-a 


X , 


F(tn,i)  — F(tn  ,±i  ) 

E(exp(z0),x)  cn~aE(exp(z0),x) 


F(tn,i)— F(tn,i  + 1 )(1-F(tn.,)+F(tni+1  )) 
c2n1~2aE 2 (exp(z/3)  ,x) 


= ^[exp(<jj 


Xj  - ( F(tn,i ) - F(tn,i+1)) 


i-1  \Jn{F(tn<i)  -F(in,i+l))(l  — F{tn<i)  + F(tnj+i)) 


)] 


= £[exp(tf 


Xx  - (F(tn<i)  - F(tn,i+1)) 


\Jn(F(tn<i)  F (tn^i ) ) ( 1 — F(tni)  + F(^ni,+1)) 


)]" 


= i(F(tn<i)  - F(tn,i+1)) 


exp  (t- 


1 - (F(^)  - F(tn,i+1)) 


\fFFi~)  - f(t„. ,+,))(!  - + F(!n,i+1)) 

-(F(Fi)  - F(t„,+,)) 


) 


exp(i 


y/n(F(tn,i)  - F(tn,i+1))(  1 - F(tnti)  + F(tn,i+1)) 


)}" 
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— F(tn,i+l)) 


(1  + t 


1 ~ {F{tn,i)  - F(tn,i+1)) 


FFFJ)  - F((„,i+I))(l  - F(i~)  + F(f„,i+1)) 

>-WU-F(W«ll 

} +o(n  ) 


\Jn[F(tnyi)  i7’(^n,i+i))(l  — F(tU'i)  + F(tn<i+ 1)) 

+ (1  - ( F(tnti ) - F(tUti+1)) 

~(F(tn,i)  - F(tn<i+1)) 


(1  + t 


4i 


\fn(F(tn,i)  - F(tn<i+1))(  1 - F(tnti)  + F(tn,i+ 1)) 


2 y/n(F(tnti)  - F(tn,i+1))(  1 - F(<n,0  + F(*B,i+1)) 


}2  + o(n-1))}n 


= {f(y-f(wi) 

*2  (^n,)  ~ n*n,.-+l))(l  ~ F(tnj)  + F(tnti+1))2  . 

2 n(F(t„,<)  - i?(^n>,+i))(l  - F(tn, ,)  + F(in,t+1))  + j 
+ (1  - (F(*n,,)  - F(tn,i+1)) 

,t 2 (1  - n*n,i)  + F(*n,t+1))F(*n,,)  - F(U)2 

2 n(F(tn>i)  - F(tn,i+i))(  1 - F(in,t)  + F(*n,i+1))  + °{n 

/ F \n  + 2 

= (1  + 2^<"“>) 


Hence  from  Levy  continuity  theorem  (Chow  & Teicher,  1978),  it  follows  that  the  stan- 
dardized first  term  of  (5.10) 


F'n(t„:i)—Fn(tn  i+1)  F (tn,, )— F(tn,i+i ) 

cn~aE(exp(z0),x)  cn~aE(exp(z0),x) 

F(tn,i)-F(tn'j+1  )(1— F(irli,)4-F(tnii+1 )) 
c2  n1  ~2a  E2  (exp  (z/3),x) 


has  an  asymptotic  standard  normal  distribution. 

To  simplify  the  notation,  the  following  four  terms  are  defined 


a.  = + hif27(y\ + °(n"3<,)’ 

24F(exp(z/?),x) 
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F(tn,i)  - F(tnli+ 1) 

cn~a E(exp(z/3),  x)  ’ 


K2  = (F(tn,t)  - F(tn,i+l))(  1 - F(tn,i)  + F(tn,i+ 1)) 
c2n1~2aE2(exp(z/3),  x) 

ul  = 

c.E2(exp(z/?),  x) 


From  the  asymptotic  result  obtained  above,  we  have 


Yn-Bn 

Vn 


N(  0,1). 


Since  we  aim  to  find  the  asymptotic  distribution  of  , we  rewrite 

Yn-An  _ Yn  — Bn  + Bn- An  _Yn-  Bn  Vn  Bn  - An 
Un  Un  Vn  Un  + Un 


Note  that 


lim 

n— *-oo 


Ul 


lim 

n— ► oo 


F(tn,i)  — F(tn^  + i)(\—  lr(fn.»)-Mr(*n.i+l  )) 

c2n1~2aE2(exp(z0),x) 


FW(x) 


cE2(exp(z0),x) 


n 


— 1+a 


= 1 


and  when  y < a < 1,  following  from  (4-7)  and  (5.12), 


lim 

n— ►oo 


Bn  A n 

~un 
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= — lim 

71— ► OO 


Ao(^)  + 


1 c2n  2aF^3^(x) 
24  E(exp(zf))}x) 


+ 0{n~3a ) - 


F{tn,i)~  F(tn,i+1  ) 
cn~a  E(exp(z/3),x) 


FWji 


cE2(exp(z0),x) 


n 


-l+a 


= 0. 


Therefore  it  follows  that 

Yr*u^n  is  asymptotically  distributed  as  N{ 0, 1).  That  is,  when  \ < a < 1 


[-7^]^(exp(,/3),x)n(12a)(rn  - A0(x) 


c2n~2aF^(x) 

2AE(exp(z/3),x)  ~ °^U  ^ 


has  an  asymptotic  standard  normal  distribution.  For  | < a < 0{n~3a ) is  asymp- 

totically negligible  and  for  | < a < 1,  F c^exp^^  is  asymptotically  negligible 
because  of  the  factor  n^~ . This  completes  the  proof  of  the  lemma. 

Next,  we  prove  that  let  us  show  the  second  and  third  terms  in  Ag(x)  in  (5.10)  are 
asymptotically  negligible.  In  other  words,  we  shall  show  that  they  converge  to  0 in 
probability. 

Lemma  5.2.2  For  0 < a < 1,  The  summation  of  the  second  and  third  terms  of  Aq(x) 
which  are  multiplied  by  nf^\ 

n(i=£)  Fn{tn,i ) ~ Fn(tnti+i) 

(tn.i+i  - tnii)E2(exp(zfl),x) 

{E(z  exp (z(3),  x)(/3  - (3)  + ( En(exp(zf3 ),  x)  - E(exp(z(3 ),  x))}, 


converges  to  0 in  probability. 
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Proof  of  Lemma  5.2.2  From  (5.13),  (5.15)  and  Chebyshev’s  theorem, 

Y _ F'n(tn,i)  — Fn(tn<j+\) 

(tn,i+ 1 - tnii)E(exp(z/3),x) 

converges  to  -g^p x ^ in  probability.  Since  we  assume  that 

E(z  exp(z/?),  x)  < oo, 

it  suffices  to  show  that  n~0— /3)  and  n^-(En(exp(z/3),x)  — E(exp(z(3),x))  converge 
to  0 in  probability.  It  is  well  known  that  ffin0  — /3)  = Op(  1)  and  ffin(En(exp(zf3),x)  — 
E(exp(z/3),x))  = Op(l),  so  that 

n~^ y/n0  - /3)  = n Op(  1)  0, 


and 


n 2 \/n(En(exp(zf3),x)  - E(exp(z(3),  x))  - n “ Op(  1)  0. 

This  completes  the  proof  of  the  lemma. 

Theorem  5.2.1  The  standardized  form  of  Aq(x), 


fz^j^^^exp^)>x)n(l2a)(^o(a;)  “ V*)  - 


c2n  2aF^\x) 
24E(exp(z/3),  x) 


has  an  asymptotic  standard  normal  distribution  for  A < a < 
If  we  have  | < a < 1,  it  follows  that 


]2^(exp(2^),x)n(12a)(A;(x)  - A0(x)) 


— F(1)(a;) 
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has  an  asymptotic  standard  normal  distribution. 

Proof  of  Theorem  5.2.1  Since  we  proved  in  Lemma  5.2.1.  that  the  normalized  first 
term  of  Aq(x)  is  asymptotically  distributed  as  a standard  normal  random  variable  and 
since  we  proved  in  Lemma  5.2.2  that  the  second  and  third  terms  of  (5.10)  with  the 
factor  nf-*-}  converge  to  0 in  probability,  Slutsky’s  theorem  (Chow  & Teicher,  1978  ) 
implies  that  Aq(x)  and  f ~)E(ex p(tg j ,r)  ^ ave  same  asymptotic  distribution.  Thus 
the  conclusions  in  the  theorem  follow  from  Lemma  5.2.1. 

5.3  Asymptotic  Distribution  of  Isotonic  Estimators  Based  on  the  Window 

In  this  section,  we  shall  derive  the  asymptotic  distribution  of  the  isotonic  regres- 
sion of  Ag(x)  for  the  wide  window  case,  (i.e.,  grid  spacing  of  the  form  cn~a  where 
0 < a < §■)  The  wide  window  case  is  important,  since  the  isotonic  estimator  of  Ao(x) 
has  an  asymptotic  normal  distribution  in  this  case. 

Barlow  and  van  Zwet  (1971)  prove  that  this  result  for  the  case  without  covariates. 
The  basic  estimator  for  failure  rate  based  on  ordered  observations  and  the  isotonic  es- 
timator based  on  the  window  are  asymptotically  equivalent.  Similarly  we  shall  prove 
that  Aq(x)  and  Aj(x)  are  asymptotically  equivalent  under  mild  regularity  conditions. 

Theorem  5.3.1  If 

(i)  A0(x)  = is  strictly  increasing  in  x > 0; 

(ii)  A0(x)  is  continuously  differentiable  and  F(3)(-)  exists  in  a neighborhood 
of  x; 

(Hi)  tn,i+ 1 - tUii  = cn~a  and  0 < a < 
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it  follows  that 

Hm  Pr[A*(x)  ^ AJ(®)]  = 0.  (5.17) 

Prior  to  proving  this  result,  three  lemmas  are  needed. 

Lemma  5.3.1  For  0 < a < | and  arbitrary  numbers  e > 0,  and  6 > 0 ,and  0 < B < 

To, 


Pr[  sup  | Ao(ar)  — Ao(x)|  > e]  < 6 for  largen 

0 <x<B 


where 


A0(x) 


Fn{tnyi)  — Fn(tn>j+ 1) 
(<n,.-+i  - tn<i)En(exp(z$),x) 


Proof  of  Lemma  5.3.1  Using  (5.10),  we  have  for  arbitrary  x, 


K(x)  ~ A0(ar) 


Fn(tn,i)  Pn{tn,i+l) F(tnti)  — _F(fnt+1) 

(tn,i+ 1 - tnti)En(exp(zj3),  x)  (tn,i+ 1 - tnii)E(exp(z/3),  x) 

I F{tn,j)  ~ F{tn,i+ 1) , / \ 

(tn<i+1  - tnii)E(exp(z/3),x)  ° X 

Fn(tn,i)  - Fn(tn<i+ 1)  f . p, E(zexp(zfl),x) 

(tn,i+ 1 — ^n,.')^(exp(2:/?),  x)  1 i?(exp(2/?),  x) 

£w(exp(2/9),x)  - E(exp(zp),x)  1 

£(exp(z/?),x)  J 

F(tn,i)  — F(tn}i+ 1) F(tntj)  — F(tUii+i) 

(*».*•+ 1 - *„,,)£(exp(2/?),  x)  (i„ti+i  - in,,)£(exp(2/?),  x) 

— A0(x) 


(^n, 


«'+l 


- tnti)E(eXp(z/3), x)  [^n(Wl)  - F(^,,+l)]} 
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+ l((t.,i+i  - i„,,-)£(exp(z/3),x)^  F (x)rn 

+i(rJ  - (1  - r)!)cV!“F<2>(*)  + i±T.  ~ ’•)3)cy3°f<3>(x) 
^ 6 
fx\l  , Fn(tn,i)  ~ Fn(tn,i+l)  ,Z  _ E(z  exp(z fi) , x) 

J (tn,i+i  — tnti)E(exp(z/3),  x)  E(exp(z/3),x) 

. Fn(tn,i)  - Fn(tnii+i)  En(exp(zf3),x)  - E(exp(z/3),x) 

(tn,i+ 1 - tnti)E(exp(zfi),  x)  E(exp(z/3),  x) 


= / + //  + ///+  /V  + Lo.f  (5.18) 

where  |<nil-  - x|  = rcn-0'  /or  0 < r < 1. 

Let  us  consider  the  first  part  (I)  of  (5.18); 


Pr[  sup 

o <x<B 


(t 


n,»+l 


1 

- tnii)E(exp(zf3),x) 

I \En{tn,i ) — ^(^n,i)]  ~ [-fn(^n,i+l)  — -^(^n.x+l )] | > e] 


= Pr[  sup 


o <x<b  (tn<i+ 1 - tn<i)E(exp(z/3),x) 

{\En(tnit)-F(tnti)\  + \Fn(t  n,i+ 1)  ~ -F(£ra,i+l)|  [ > e] 


+ Pr[o<“<B  c£(exp(z/?),x)"”|fn(‘"''+l)  ~ F(i"'i+l)l  > 


< 5 for  large  n. 


(5.19) 


The  last  line  follows,  since 


(i)  E(exp(z/3),x)  is  bounded  on  [0,B], 
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(ii)  Pr[sup|0<x<B|  na\Fn(tn,i+i)  — F(£n,t+i)|  > e]  < 8 for  large  n and,  0 < 


It  is  straightforward  to  show  that  the  second  part  of  (5.18)  goes  to  0 as  n goes  to 
oo.  Let  us  consider  the  third  part  of  (5.18); 


Pr[  sup 

0<x  <B 


, Fn{tn,i)  - Fn(tnti+i)  « E(zexp(zf3),x) 
(tn,i+i  ~ tn<i)E(exp(z/3),xy  E(exp(zf3),x) 


< 


Pr [2c-lna(l3  - 0)  sup 


E(zexp(zf3),  x ) 


o<x <b  E2(exp(z/3),x) 


>«] 


< <5. 


(5.20) 


The  last  line  follows  since  E^zexp^z^^^x)  and  E(exp(zf3),  x)  are  hounded  on  [0,B] 
and  na((3  — (3 ) goes  to  0 as  n goes  to  oo  when  0 < a < 

Finally,  consider  the  fourth  part  of  (5.18); 


pr[  sup  | Fn(tn,i)  - Fn(tnti+1)  En(exp(z(3),  x)  - E(exp(z/3),  x) 
O <x<B  (tn,i+i  — tnii)E(exp(zfl),  x)  E(exp(z/3),x) 


> c 


< Pr[2c->  SUp  i"‘,{g.(exp(z/?),x)-i;(exp(^),I)} 

E2(exp(z/3),  x)  * 


O <x<B 


< 8. 


(5.21) 


The  last  line  follows  from  the  fact  that  n*  {En(exp(zf3),  x)  - E(exp(z/3),  x)}  converges 
to  a Gaussian  process  (Tsiatis,  1981).  It  was  shown  in  Section  (5.2)  that  the  h.o.t  is 
negligible.  Combining  (5.19)  (5.20)  and  (5.21),  the  proof  of  the  lemma  is  complete. 
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Let  us  define  \^A'B\x)  for  any  A and  B(0<A<x<B<  T0). 


\?A'B](x) 


min  max 

s>i  + l r<» 


Bn(tn,s)  ~ Fn(tn,r ) 


E ffr(t  n,j+l  tn,j)Bn(exp(zf3),  Xj  ) 


It  is  clear  that 


0)  a;(*)  = a;[0'To,(z) 

(ii)  A;(.r)  < A;|0-b1(i) 

(iii)  A;(z)  > A^tx). 


Next,  we  prove  the  following  lemma,  which  allows  us  to  consider  only  a bounded 
range  of  time  for  proving  Theorem  5.3.1. 

Lemma  5.3.2  For  a fixed  x and  any  A and  B,  0 < A < x < B < T0> 

Jim  Pr[A;(z)  # Af'Bl(x)]  = 0. 

Proof  of  Lemma  5.3.2  The  proof  of  this  lemma  consists  of  proving  the  following  two 


results. 


Jim  Pr[A;(x)  ± A;10’b,(i)]  = 0 


(5.22) 


and 


lim  Pr[A*Bl(x)  # A^tx)]  = 0. 


(5.23) 
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Since  the  proofs  of  (5.22)  and  (5.23)  are  similar,  we  shall  prove  (5.22).  In  other 
words,  we  need  to  prove  that  for  any  > 0,  there  exists  N such  that  whenever  n > N , 

Pr[A;M  # A;[0'b'(x)]  < 6,.  (5.24) 

If  B = T0,  then  -V)f.c ) = A and  the  proof  is  completed.  Assume  B < To.  Let 
us  divide  [ x , To]  into 


[x,  To]  = [or,  C{)  U [CUC2)  U [C2,  B)  U [B,  B1)  U [Bu  T0 ] 
h h h h h 

where  x is  a fixed  point  and  B\  is  close  to  Tq.  Let  C\  and  C 2 be  any  two  points 
satisfying 


Ao(C<2)  — Ao(C'i)  > 2e 

(5.25) 

Aq (B)  — Ao(C2)  > 8 + e 

(5.26) 

for  some  8 > 0. 

From  Lemma  5.3.1,  for  any  e and  8 > 0,  there  exists  N such  that  whenever  n > N 
pr[  sup  |AS(x)  - A0(x)|  < e]  > 1 - 

0<x<Bi  O 

Denote  the  subset  of  all  sample  points  satisfying 

sup  |A;(x)  - A0(x)|  < e 

0<a:<Bi 


as  Then  Pr(flj)  > 1 - | if  n>  Nx. 


Define 
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for  i = 1, • • • , 5. 


Ifix) 


1 i/l£  /, 

0 otherwise 


Denote  the  subset  of  all  sample  points  satisfying 


U,(xj)/4(x) 


> W > 0 


for  some  W > 0 as  D2.  Then  Pr(Jl2)  > 1 - f if  n > N2. 
Denote  the  subset  of  all  sample  points  satisfying 


EguMhjx)  e 

££M*i)  J 

as  fl3.  Then  Pr(H3)  > 1 - | if  n > N3. 

Now  we  focus  our  discussion  upon  fixed  sample  points  in  D such  that  when  n > 
ma x{NuN2,  N3},  Pr(Q)  >1-5  where  D = n D2  D ft3. 

Since  we  note 

a;w  < a;[0'S|(x), 

it  suffices  to  show 

a;m  > a;|0'b)(x) 

for  fixed  x.  If 


min 

s>i  + l 
tn,3 


£?=«•  A*(xjy(xi) 


> min 

«>*+l 
tn,3<C  1 


£?=.•  *o(giMsj) 


(5.27) 


it  follows  from  the  fact  that  C\  is  arbitrary  point  less  than  B that 


min 


^0 (xj)u(xj) 


mm 

«>*  + l 


Es—1 
j—T 


(xj)u;(xj) 

U>(Xj) 


(5.28) 
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Next>  usin9  Lemma  5.3.1,  the  monotonicity  of  \0(x),  (5.25),  and  (5.26) 
serve  that 

j 4ef  E*=?  K (Xj)uj(xj)I1(x) 

ZjZi  w(xj)h(x) 

< EjZi  ^o(xJ)iv(xJ)/1(x) 

Z£v(xj)Mx)  +e 

< Aq(C'i)  + e, 


EjJ  *o(xj)w(xj)I2(x) 
EjZt  u(xj)I2(x) 


> 

> 


EjJ  Mxj)v(xj)I2(x) 
EjZi  u(xj)I2(x) 
MCi)~e 


> J - 2e, 


ZjJ  *o(xj)v(zj)L3(z) 

EjZi  w(xj)I3(x) 


> EU  Xo(xj)^j)h{x) 

EjZl  U>(Xj)I3(x) 

> Ao(C2)-e 

^ Ao(Cj)  + e > J, 


K(xi)u(xj)h(x) 

EjZ}  u/(xj)I4(x) 


> EjJ  Ao(gj)t4;(gj)/4(a;) 

EjZi  w(xj)I4(x) 

> A0(i?i)  — e 


we  ob- 


(5.29) 


(5.30) 


(5.31) 


^ ^o{C2)  + 6 
^ Ao(C'i)  + 2e  + 6 
> J + e + 6. 


(5.32) 
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We  also  observe  that 


Ej=j^(Xj)1 4(x) 
Eg  u(Zj) 


*n,.  ~ 5 W| 

ixi) 

\I4(x)/(tn,3  ~ B) 

— tn,i  Zj=} 

uj(xj) 

/( 

^n,s  ^n,t) 

i 

ln,s  B 


ln,s  tn,i 

> w 


(5.33) 


for  large  n and  constants  r > 0 and  W > 0,  and  since  Bx  is  close  to  T0,  there  exists 
N such  that  for  n > N,  we  have 


£g^)/5(a) 


< 


(tn.'-BJZgul 

{xj)I5(x)/(tniS  - Bi) 

(tn,s  ~ tn,i)  ££! 

w(*i) 

'/I 

^n,s  ^n,t) 

1 

(fn,s  ~ Bi) 

p 

{tn,s  ^n,t) 

e 

J 


(5.34) 


for  some  constant  r > 0. 

To  prove  (5.27),  using  (5.29)-  (5.34)  we  observe  that  for  arbitrary  e > 0 and 
some  6 > 0 


EjJ  *o(xj)v(xj) 

E£M*i) 


> 


Z£  "(xj 
. Z’Z]  u(Xj) 

1 V / U\ 

) E&  “ 

«*)  Eg  a;i 

J / V 

(Xj)/l(x) 

'xj)o;(xj)/2(x) 

ZjZ]  w(: 

*>•)  e*:‘ 

M*)  E*;f  a;< 

o;(xi)/2(a;) 

>,-)«^(®j)^3(ar) 

Z)Z!  «(: 

, E£M*;) 

V-'S-l  /. 

'i)  Eg 

M*)  ES  a;( 

vMMx) 

X j)UJ{x j)I4(x) 

Z°z}  u(xj)I5(x)  E WxjMxjIMx) 

Eg«(*i)  «(*,-)/.(*) 


ZjJ  u(xj)h(x) 

Eg«(*i) 


J 
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> J 


(5.35) 


for  8 > W-  Hence  (5-®V>  ^ich  implies  (5.24),  holds. 

It  is  awkward  to  deal  with  AJ(x)  to  prove  Theorem  5.3.1  since  En(exp(zf3),x)  is 
in  the  denominator  of  A/(ar).  Hence  we  shall  therefore  define  an  alternative  estimator 
which  has  the  same  asymptotic  properties  and  is  easier  to  handle. 

For  simplicity,  let  us  define 


u0(xj)  = (tnj+i  - tnJ)E(exp(z/3),xj) 


and 


mm  max 

J>*  + 1 r<i 

tntr>A 


Fn(tn,a)  - Fn(tn,r) 


Ej=r(t»J+ 1 - tn,j)F(exp(z/3),Xj) 


min  max 

»>i+l  r<t 

tntr>A. 


Fn(tn,s)  ~ Fn(tn,r) 

EjZr  ^o(Xj) 


min  max 

5>i+l  r<t 
tn,a^B  tn,r>A 


Ej=r  FnjXj^ojxj) 

E*=r  MXj) 


From  (5.11),  we  define 


Yn  = Yn(x)  = 


Fn(tn,i)  — Fn(fn,t+1 ) 


-tn,j)E{exp(zf3),xjy 
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Our  first  objective  is  to  show  that  Theorem  5.2.1  holds  with  A^’s](a:)  instead  of 
A;(x).  We  have  shown  that  Yn(x ) has  the  same  asymptotic  distribution  as  Ag(x)  in 
Lemma  5.2.1.  If  we  can  show  that  for  0 < a < | 


1 — c 

sup  n 2 

A<xr<x<x,<B 


Fn{tn,s)  Fn(tnr)  Fn(tntS)  — Fn(tntT) 


(5.36) 


in  probability  as  n goes  to  oo,  then  it  suffices  to  show 


limPrfr^'M  ? Y.(x)]  = 0 


to  prove  Theorem  5.3.1.  The  above  would  follow  because 


(x) 


min  max 

a>i+l  r<» 

tn,r>A 


Fn(tn,s)  ~ Fn(t^r) 

Ej=r  U0(Xj) 


= min  max 

«>»+l  r<» 

tn,s<B  tn,r>A 


'Fn(tn,,)  ~ Fn(tntr) 

ES«(xi) 


f Fn(tn<s)  - Fn(tn,r) 

Fn(tn,s)  Fn(tn ,r)  j 

*«*■» 

f 

rH 

1 II 

®J  <-> 

w 

iJ 

(5.37) 


Therefore,  (5.40)  would  imply  that 


n^|A^'Bl(x)-A^(x)|->0 


(5.38) 


in  probability  as  n goes  to  oo. 


77 


Lemma  5.3.3  For  0 < a < | 


i — a 

sup  n 2 

A<Xr<X<Xs<B 


Fn(tn,s)  - Fn(tn<r) 


fli(<n,»)  ~ ^n(*n,r) 


o 


(5.39) 


in  probability  as  n goes  to  oo. 


Proof  of  Lemma  5.3.3  Note  that 


n 


1 — a 
2 


Fn(tn,s ) ~ Fn(tnir) 

Ej= lMXj) 


1 —a 

— n 2 


_ Fn{tn<3)  - Fn(tn,r) 

EjZr  Wo(Xj) 


< 


n 2 Fn(tnr)  — -F(in>a)  + ,F(£nir)| 

|E-=Mxi)-EjiNo(^)l 
E£«(*i)E£  «*>(*;) 

ES«(*i)ESwb(*i) 


+n  2“  |F(<n,s)  - F(tntr) 


= L + IL 


Since  y/n(En(exp(zfi),  x)  — E(exp(zf8),  x))  converges  to  a Gaussian  process  ( Tsi- 
atis,  1981), 


1 S_1 

— — J2  Vn[En{exp(zf3),  x ) - E(exp(zf3),  *)] 

]=r 


< 


sup  \y/n\En(exp(zf3),  x ) - E(exp(z(3),  x)| 

A<x<B 


= 0,(1). 


(5.40) 
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Note  that  since 


5—1 


5—1 


E^o(xi) 

j=r  j-r 


(5.41) 


in  probability  as  n goes  to  oo  as  (3.11),  where  d < E{exp(zfi),x)  < D for  positive 
constants  d and  D,  it  follows  that 


5 — 1 


cn 


'"(5  — r)d  < ojq (xj)  < cn  a(s  — r)D 


]=T 


and 


cn  a(s  — r)d  < )(xf)  < cn  "(5  — r)D. 


]=r 


Next,  using  (5.f0)  we  observe  that 


5 — 1 5 — 1 


5-1 


lx*;)  - X>o(*;)|  = cn-a-1i\J2'/n(En{exp{zf3),x)  - E{exp{zf3),x))\ 


j=r  ]=T 


J =r 

= cn-“-i(a -r)|—  £v^(£n-£)| 
s — r 

:=r 


= c(s  - r)n  a 2Op{  1) 


(5.42) 


and  y/n(Fn(x)  - F(x))  = Op(  1). 

Using  the  above  results,  the  first  term  (I)  becomes 


(I)  < n^(l)*-r)"~rt0’(1) 

c2(s  — r)2n~2ad 2 


a=L  Op( 1) 

= n 2 — — ( — 
c(s  — r)d 2 


. «=iOp(l) 

“ n 2 cd2  ’ 


(5.43) 
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which  converges  to  0 in  probability  as  n goes  to  oo. 


F(tn,s)  - F(tn,r)  = (tn>3  - tn,r)\F^(n\ 


where  t*  is  between  tUi3  and  tUt1.  and 

tn,s  tn,r  ~ r )cTl  , 

Similarly,  we  show  that  the  last  term  (II)  converges  to  0 in  probability  as  follows; 


(II)  < n^is 


r)cn-Q|F^(r)| 


c(s  - r)n  a ^ Op ( 1 ) 
(5  — r)2d2c2n~2a 


= n=?\FU(t')\Op(l) 


(5.44) 


which  converges  to  0 in  probability  as  n goes  to  00. 

Using  the  previous  lemmas  we  shall  prove  for  fixed  x, 


Jim  Pr[K„(x)  ? A^'M}  = 0 

Proof  of  Theorem  5.3.1  Throughout  the  proof  of  Theorem  5.3.1,  we  assume  the  range 
of  time  is  bounded,  since  we  have  proven  that  the  isotonic  estimators  of  A0(x)  are 
equivalent  in  situations  where  the  range  of  time  is  either  finite  or  infinite.  For  fixed 
xi  £ [tn,i,tn,i+ 1)  we  define  the  isotonic  estimator, 


A/  ‘(xi)  = mm  max 

a>t  + l r<* 


Ej=r  *;(*,> o(x,-) 

r~l  M*i) 
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Since 


mm  max 

3>i+l  r<« 

tn,s<B  tntr>A. 


Ej= r K(gjVo(Xj) 
£j=J  wb(*j) 


7^  K(X,- 


implies  either 


3 m > 1 3 /[>i<tn,r<t„j3<s] 


ESV*i) 


< K(ar.- 


or 


3 m > 1 3 /[J<,n,,<,,,<a)£;°‘-”F'l(Xi)“ofe)  > K(x, 

Di=.-m  ^o(zj) 


or  6o£/i,  it  suffices  to  show 


Jim  Pr{3  m > 1 3 I[A<tn,r<tn,.<B] 


££?  YnjxjfaiXj) 


< Yn{Xi)}  = 0 (5.45) 


and 


J™  Pr  {3  "*  > 1 3 ^<,,,,<,,,,<B|g%"r"(Xi|“o(Xj)  > n <*,•)}  = 0 (5.46) 

Ej=i-m  “0 (Xj)  ’ 


Due  to  the  similarity  of  the  proofs  of  (5.44)  and  (5.45),  we  shall  only  derive  (5.44). 
Since 


E-y  *;(*>  0fo) 


3 m > 1 9 I[A<tn,r<tn,,<B]-  J ^m+i.  "rJl  < En(xi 


_ | | / t E>=i  Ei(^jVo(xj)  "i 

m=l  ^ L^j—{  U0{Xj)  J 
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we  shall  show 


iis,  £ < *(*.•)}  = o. 

j=!  +?=,•  uo(Zj) 


m= 1 


Let  us  define 

= u0 {Xj) 

3 Er=t^o(^)' 

By  Chebyshev’s  theorem,  it  follows  that 


m+t 


Pr<E  < y-fe)}  = 


m+« 

= Pr{E(K(^)-r„(x,))Pi  <0} 


< - y,(xQ)] 

{I :?+'PiE{Y„(xj)  - r„(x,)]}2 

First,  let  us  simplify  VarK£j'' Pj(Y„(xj)  - Y„(x ,))]. 

It  is  well-known  that 


m+t 


m+t 


m+t 


\ «r[^  pj(Yn(xj)  y^,(x,))]  < 3(Var[^  pjYn(xj)\  -f  V ar[^2  PjEi(^i)]) 


j=t 


:=i 


j=t 


^/ar[EjLt' is  simplified  as 


vMEfvKfe)]  = tME  , F’,(tnj') ~ f"(t-w)] 

j=<  Ej=i  wb(x,-)  wb(*j) 


j=« 


= V ar[ 


Fn(tn,i)  Fn{f-n,m+i ) 


(5.47) 


(5.48) 
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< { 1 121 


Z?=i'u  o(xj)  n 


y-{F(tnii)  - F(tn,m+t)} 


< 


n(m  + 1 )2d2n~2ac2 


men  “|F^^(x*)| 


^ ^ (5-49) 

for  some  constant  Ci  where  d < E(exp(zfi),  x)  < D for  some  positive  constants  d and 
D.  Similarly  Var[Jff+f  PjYn{Xl)}  is  also  simplified  from  (5.15)  as 


m+i 


m+i 


Var^pMxi)}  = (£Pj)2Far[yn(:rt)] 


3-1 


J=i 

nl~a 


(5.50) 


Using  (5.13),  and  noting 


p2  _ f l2  . B 

m+J  Ie^V'^^)/ 


for  some  constant  B > 0 and  denoting  [ ] the  greatest  integer  less  than  the  quantity 
within  the  bracket,  it  follows  that 


m+i  m+i 

{£p;£[K(*;)-rn(st)]}2  > { £ PjE[Yn(xj)  - rn(xt)]}2 

)=.+[?] 


= PlHE*{Y„(xIHfi)  - n(x,)]([y])2 
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> 


B 


nr 


i~2ac2n~2a 


a?])2, 


m 


(5.51) 


for  some  constant,  C3. 


Hence  by  (5.48),  (5.f9)  and  (5.50), 


prfr  Yn(xi)uo(xj)  ^ 

j=i  ^j=i  U0[Xj) 


C4 

n1~3am2 


for  some  constant  c4.  It  follows  that  for  0 < a < | 


m= 1 


m+‘  Yn(Xj)u 0(Xj) 

U E7=t‘wo(xj) 


lim  £ Pr{E  =3£T):”  < Yn(*i)}  = 0. 


Hence  it  follows  that  the  asymptotic  distributions  of  A|(x)  and  Aq(;c)  are  identical. 
The  following  result  is  an  immediate  consequence  of  Theorem  5.3.1. 


Theorem  5.3.2  Under  the  same  assumptions  as  in  Theorem  5.3.1, 
for\<a<\ 


[ jraiL) ]^(exp(*ft),  x)n  1 2° ( A/(*)  - Ao(x)) 

and  for  j < a < | 


[j^]^E{exp{z^x)n^  (X}(x)  - \0(x)  - 


c2n  2“F(3)(x)  \ 
24E(exp(z/3),x)J 


are  asymptotically  distributed  as  N( 0, 1). 


Proof  of  Theorem  5.3.2  This  is  an  immediate  consequence  of  Theorems  5.2.1  and 


5.3.1. 
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5.4  Determination  of  Window  Size 

\\  hen  we  discuss  estimation  based  on  windows,  the  following  question  arises, 
What  is  the  optimal  size  of  the  window?”  The  answer  determines  how  many  windows 
we  can  have  in  order  to  obtain  the  smoothness  of  the  baseline  hazard  function,  taking 
into  account  the  mean  square  error  of  estimates  of  the  baseline  hazard  function.  The 
recommendation  on  window  size  is  made  in  terms  of  the  mean  square  error  (Barlow 
et  ah,  1972). 

Parzen(1961)  discusses  the  problem  for  choosing  the  size  of  the  window  in  density 
estimation.  In  his  paper,  he  uses  the  following  estimate  of  the  density,  f(x), 

f / \ _ Fn{x  + h)~  Fn(x  - h) 

’ 2 h 

where  A is  a suitably  chosen  positive  number.  After  reviewing  the  statistical  proper- 
ties of  fn(x),  especially  the  mean  square  error  of  fn(x),  he  finds  the  value  of  h which 
minimizes  the  mean  square  error  for  a fixed  value  of  n.  From  (5.13)  and  (5.15)  we 
can  see  that  the  mean  square  error  of  Aj(x)  in  estimating  Aq(x)  is 


MSE[AJ(x)j  = Var[A*(ar)]  + Bias1 2[A}(x)] 


F^(x) 

cE2(exp(z/3),  x) 


n 


a— 1 


+ 0(n~4a ) 


(5.52) 


1 A 

If  MSE*[Aj(x)]  is  asymptotically  smaller  than  the  window  size  0(n~a),  then  it 
is  intuitively  clear  that  asymptotically  A /(f„,«)  < A J(fnt+1).  This  follows  because 

|Ao(^n,t+i)  — A0(fn,i)|  = 0(n~a)  (when  A0(x)  has  a positive  first  derivative  in  a neigh- 
borhood of  x). 
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Clearly,  MSE*[Aj(x)]  = 0(n~a)  if  a < J.  This  follows  from 
cr[A;(a;)]  = Var2  [A;(x)]  = 0(12^)  = 0(n~a) 


provided  ^ < —a,  or  a < 

Returning  to  our  problem,  we  can  show  the  mean  square  error  of  A}(ar)  is  mini- 
mized when  we  choose  a = J.  To  see  this,  recall  from  (5.13)  and  (5.15)  that  the  mean 
square  error  is  approximately 


MSE[AJ(x)]  % 


Ao(z)rca  1 

cF^ix)  + 


' c2FW(x)  l2 
.24E(exp(z/3),x). 


—4a 


The  minimum  is  achieved  by  choosing  a — where  MSE  is  treated  as  a function  of 
a.  Unfortunately,  the  optimal  size  of  the  window  still  depends  on  the  value  of  c.  To 
solve  this  problem,  if  we  choose  fn;t-  = yu^  by  the  principle  of  maximum  likelihood, 
the  order  statistics  from  a sample  size  of  n , we  obtain  the  same  estimator  as  one 
derived  in  Chapter  3.  We  modify  the  window  size  by  choosing  i„.,  = y ^ where  [ 

] denotes  the  greatest  integer  of  the  quantity  within  the  brackets,  this  yields 

£ 

yu([(.+l)n£])  ~ yu([in$])  = = 0p(U  5 )> 

so  that  the  recommended  requirement  that  the  mean  square  error  as  a function  of  cr 
is  minimized  at  a = | is  satisfied. 


CHAPTER  6 

SIMULATION  AND  CONCLUSION 


Suppose  we  have  a single  binary  covariate  model  (two  different  groups  of  cancer 
patients  e.g.,  male  and  female),  and  we  are  interested  in  the  instantaneous  failure 
rate  at  a given  time  conditional  upon  survival  up  to  this  given  time.  In  terms  of 
Cox’s  regression  model,  it  is  enough  to  consider  the  regression  parameter  and  the 
baseline  hazard  function.  This  chapter  contrasts  the  three  different  estimators  of  the 
baseline  hazard  function  which  are  found  in  Chapter  3 and  Chapter  5. 


6.1  Estimators 

There  are  three  estimators  of  Ao(£)  for  fixed  t.  We  assume  that  Ao(t)  increases 

monotonically,  and  0 is  the  maximum  likelihood  estimator  of  0 obtained  from  marginal 
likelihood. 

(i)  The  first  estimator  of  Ao(£),  denoted  by  E\,  is  obtained  using  Breslow’s 
approach  (1974)  which  approximates  an  increasing  function  by  a nonde- 
creasing step  function.  The  joint  likelihood  of  0 and  the  A/s  is  used.  The 
maximum  likelihood  estimator  of  Ao(£)  is 
C _ 1 

^ * — — - — - — . r 6 1 ^ 

(Vu(j)  - yu(j- 1))  E/6K(yu(>))  exp (z,0) 

(ii)  The  second  estimator  of  Ao(t),  denoted  by  E2,  is  derived  using  the  iso- 
tonic regression  method  with  ordered  failure  observations.  The  isotonic 
estimator  of  A0(£)  is 

A;  = maxmin(t  - s + l)  / £>u(i)  - ^u(j_a))  £ exp (z,0).  (6.2) 

s 'eRteu(j)) 
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(iii)  The  final  estimator  of  Ao(t),  denoted  by  E3,  is  also  derived  in  a similar 
fashion  to  the  second  estimator,  except  that  a window  is  used  with  ordered 
failure  observations.  The  isotonic  estimator  based  on  the  window  is 


Fn{tn,r ) ~ Fn(tn,s ) 


\}(x)  = min  max — — — — 

s-‘+1  ££ l(tnj+1-tnj)En(exp(z^Xj) 

where  En(exp(z0),  x)  = ^ E"=i  exp (z^)I[Tt>x]. 


(6.3) 


6.2  Procedure  for  the  Simulation 
The  outline  of  the  simulation  is  as  follows: 

(1)  Generate  four  sets  of  50  random  numbers  from  a uniform  distribution  on 
(0,1),  and  call  them  C/lt,  t/2,,  C/3t  and  C/4,-  for  i = 1,  — ,50  (see  Table 
6.1). 

(2)  Obtain  the  survival  and  censor  data  yi„F2,,r3,  and  F4;  by  converting 
Uli,  C/2,-,  C/3,-  and  C/4,-  using  Yk{  = F_1(C/£;,)  for  k = 1,  • • • , 4,  where 

F(x)  = exp(— x2e2)  for  C/1,- 

F(x)  = exp(-x)  for  C/2, 

F(x)  = exp(— a;2)  for  C/3,- 

and 

F(x)  = exp(— 2x)  for  C/4,-  * = 1,  — ,50 

respectively. 

Note  that  Cox’s  regression  model  is  given  as  A (x-,z)  = 2xexp(2 z).  When 
F(x)  = exp(— x2e2),  the  corresponding  Cox  regression  model  is  A(a:;  1)  = 
2a:exp(2).  When  F(x)  = exp(— a:2),  the  corresponding  Cox  regression 
model  is  A(x,  0)  = 2x.  The  baseline  hazard  function  is  2 1 which  increases 
monotonically  in  t. 
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(3)  Form  Group  1 with  } 1,-  in  such  a way  that  censored  data  is  created  com- 
paring } 1,  with  } 2 If  y 1,  > Y2{,  then  Y 1,  is  a censored  observation, 
otherwise  y 1,-  is  an  uncensored  observation.  Similarly  form  Group  2 with 
y^3t  and  F4,. 

(4)  Obtain  the  maximum  likelihood  estimator  Ex  of  A 0(t)  using  (6.1)  and  the 
isotonic  estimator  E2  based  on  the  ordered  observations  using  (6.2). 

(5)  Obtain  the  isotonic  estimator  E3  with  the  optimal  size  of  window  using 
(6.3).  The  optimal  size  of  window  is  determined  adjusting  the  order 
statistics  so  that 

size  of  the  i th  window  = y 4 — y £ 

“([(•')4)1  u([(, -i)4)] 

for  appropriate  c > 0.  After  completing  100  pilot  simulations  with  the 
values  of  c = 36, 33, 30, 27  and  24,  we  found  that  the  mean  square  error 
is  minimized  at  c = 36.  Since  the  sample  size  of  each  simulation  is  100, 
the  window  size  is  ns/ 36  = 1.105.  This  implies  that  most  of  observations 
must  be  used  to  obtain  the  isotonic  estimator  which  gives  the  minimum 
mean  square  error. 

(6)  Repeat  the  steps  (1)  to  (5)  1000  times  with  c = 36.  Noting  that  all 
estimators  are  functions  of  time  t,  obtain  the  three  estimators  at  p(i) 
where  p(i ) is  defined 

P(i)  = ^/log(0.95  - 0.05(f  - 1)) 

for  i = 1,  • • • , 19.  In  other  words,  p(i)  is  the  100(0.95  - 0.05(i  - 1)) 
percentile  of  the  baseline  survival  function 


So(t ) = exp(— t2). 
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(7)  Find  the  mean  square  error  of  E\,  E2  and  E3.  We  define  the  mean  square 
error  by 

1 1000 

MSE[9]  = Tooo  “ ^ for  * = 

u i=i 

where  the  values  of  t are  p(i)'s.  Note  that  Ao(f)  = 2 1 is  assumed. 

(8)  Find  the  relative  efficiencies  of  E 2 and  E3  versus  Ex . We  define  the  relative 
efficiency  as  the  ratio  of  mean  square  errors  of  two  different  estimators  for 
fixed  p(i).  E2  and  E$  are  the  isotonic  estimators  while  E\  is  the  maximum 
likelihood  estimator.  Table  6.1  gives  the  relative  efficiencies  of  the  isotonic 
estimators  as  compared  to  the  maximum  likelihood  estimator.  (Also,  see 
Figure  6.1.) 

(9)  To  investigate  the  relative  efficiencies  of  E2  and  E%  over  E\  for  extended 
cases,  let  us  assume  that  for  r > 1 


Ao(z)  = rxT  1 

so  that  the  baseline  survival  function  is 

5o(x)  = exp(— xT). 

(10)  Repeat  the  steps  (2)  and  (3)  using 

F(x)  = exp(-xre2)  for  Uli 

F(x)  = exp(— x)  for  C/2,- 

F(x)  = exp(-xr)  for  C73, 

and 

F(x)  = exp(— 2x)  for  C/4,- 
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(11)  Find  the  maximum  likelihood  estimator  and  the  isotonic  estimator  with 
data  generated  by  the  step  (10). 

(12)  As  in  step  (6),  obtain  the  three  estimators  of  \o(t)  at  fixed  p{i ) which  is 
defined  by 

p(i)  = {log(0.95  - 0.05(f  - 1))}^ 
for  i = 1,  • • • , 19  and  r > 1. 

(13)  Repeat  the  steps  (7)  and  (8)  1000  times  with  r = 1.0, 1.5, 2.5  and  3.0. 

(14)  The  relative  efficiencies  E%  to  E\  and  E3  to  E\  for  given  values  of  r are 
presented  in  terms  of  the  probabilities  that  an  individual  survives  up  to 
time  t.  Hence  we  can  see  the  validity  of  the  relative  efficiency  measure 
as  the  survival  probabilities  vary.  When  i = 1,  the  relative  efficiency  is 
important  because  the  probability  of  survival  to  time  p(l)  is  95%.  When 
t = 19,  the  efficiency  is  not  meaningful,  since  the  probability  that  an 
individual  survives  up  to  time  p(19)  is  only  5%.  ( See  Tables  6.2-6.5  and 
Figures  6. 2-6. 5.) 

6.3  Conclusion 

The  sum  of  mean  square  errors  over  t values  is  applied  to  determine  the  better 
estimators.  We  discuss  the  general  findings  of  the  simulation  in  this  section. 

The  main  problem  is  to  determine  the  optimal  size  of  the  window  which  yields 
the  minimum  of  mean  square  errors  of  E3  for  fixed  t values.  (Refer  to  Section  5.4.) 
We  assume  that  for  0 < a < 1 and  some  positive  constant  c > 0, 


^n,i+l  tn,i  — 


(6.4) 
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to  derive  £3.  It  turns  out  that  the  window  size  must  be  proportional  to  n~$  where  n 
is  the  sample  size.  Because  the  constant  c is  unknown,  we  have  to  modify  the  optimal 
size  of  the  window  which  depends  on  the  ordered  observations. 

Using  definition  (6.3),  we  consider  the  isotonic  estimators  based  on  the  window 
only  when  the  right  limit  of  the  last  window  is  less  than  the  largest  failure  time.  We 
do  not  have  estimators  between  the  right  limit  of  the  last  window  and  the  largest 
failure  time,  where  the  isotonic  estimators  £3  are  not  defined.  We  are  interested 
in  three  estimators  of  A 0(f)  when  A 0(t)  increases  monotonically.  We  anticipate  that 
£2  and  £3  are  better  estimators  than  £j,  since  £2  and  £3  are  obtained  using  the 
assumption  that  A0(f)  increases  monotonically.  We  define  the  relative  efficiency  of  £, 
versus  £j  by 

sum  of  squared  errors  of  £1 

Jtj  — 

sum  of  squared  errors  of  £, 

for  1 = 2, 3.  The  relative  efficiency  is  presented  in  terms  of  the  probabilities  that  an 
individual  survives  up  to  time  t.  Hence  we  can  see  the  validity  of  the  efficiency  as 
the  survival  probability  varies.  For  example,  when  i = 19,  the  efficiency  draws  little 
attention,  since  the  probability  that  one  survives  up  to  time  p(19)  is  only  5%. 

The  isotonic  regression  method  for  estimating  Ao(£)  using  the  window  outperforms 
the  maximum  likelihood  estimator  over  whole  range,  when  A 0(t)  is  assumed  to  be  an 
increasing  function.  But  the  isotonic  regression  method  based  upon  the  ordered 
failure  observations  is  better  than  the  maximum  likelihood  estimator  up  to  the  time 
when  an  individual  can  survive  with  probability  more  than  0.5,  while  it  is  just  as 
good  as  the  maximum  likelihood  estimator  when  the  probability  of  survival  is  at 
most  0.5  in  terms  of  the  baseline  survival  function.  The  isotonic  regression  method 
with  a window  is  more  efficient  than  the  isotonic  regression  method  without  a window 
when  the  survival  chance  of  an  individual  is  at  most  0.5,  while  the  isotonic  regression 
method  with  a window  does  not  show  any  significant  improvement  over  the  isotonic 
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regression  method  without  a window  when  the  survival  chance  of  an  individual  is 
greater  than  0.5. 

We  conclude  that  among  the  three  estimators  of  the  baseline  hazard  function  of 
Cox’s  regression  model,  the  isotonic  regression  method  with  a window  is  the  most 
efficient  estimator  for  the  whole  range. 
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FaMr  6.2.  Relative  Efficiencies  of  E2  to  Ev  and  E3  to  Ex  when  l-l.O 


Time 

Ih 

R'2 

No.  of  Estimators 

1 

7. 69 198 

0.62580 

1000 

2 

53.20185 

5.86470 

1000 

3 

34.00603 

4.76669 

1000 

4 

3.96048 

0.59525 

1000 

5 

5.14114 

0.73327 

1000 

G 

1.91799 

0.29948 

1000 

7 

2.03008 

0.36398 

1000 

8 

3.59762 

0.92791 

1000 

9 

3.77598 

1.15719 

1000 

10 

2.11634 

0.71342 

999 

11 

1.82748 

0.76406 

995 

12 

0.96547 

0.55186 

964 

13 

0.83177 

0.69434 

896 

14 

0.84174 

0.68594 

756 

15 

0.64364 

0.26801 

562 

16 

0.96873 

2.97194 

332 

17 

0.89445 

0.15094 

163 

18 

0.17965 

1.4E-02 

50 

19 

0.14717 

2.5E-03 

6 

Efficiencies  of  isotonic  estimators  versus  maximum  likelihood 
estimator  when  r=1 .0 


O 

C\]  “ 


Percentile  of  the  baseline  survival  function 


Figure  6.1.  Efficiencies  of  E 2 and  E$  versus  E\  when  r = 1.0 
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Table  6.2.  Relative  Efficiencies  of  £2  to  E\  and  E3  to  E\  when  r= 1.5 


Time 

Ri 

Ri 

No.  of  Estimators 

1 

1.93105 

0.51494 

1000 

2 

2.84304 

1.00692 

1000 

3 

5.45463 

2.62717 

1000 

4 

7.35513 

4.36878 

1000 

5 

5.53266 

3.75795 

1000 

6 

3.87954 

3.31796 

1000 

7 

2.84534 

2.80851 

1000 

8 

7.21880 

7.90580 

1000 

9 

1.92854 

3.22761 

1000 

10 

8.94882 

29.75328 

998 

11 

2.99180 

8.50156 

990 

12 

1.14700 

4.21029 

954 

13 

0.72456 

4.32069 

905 

14 

1.08323 

7.25389 

805 

15 

0.94693 

22.04793 

664 

16 

0.74733 

3.67969 

491 

17 

1.02179 

18.34358 

314 

18 

0.81902 

1.54799 

151 

19 

1.70782 

0.24051 

44 

Efficiencies  of  isotonic  estimators  versus  maximum  likelihood 
estimator  when  r=1 .5 


o 

CM 


o - 


20  40  60  80 

Percentile  of  the  baseline  survival  function 


Figure  6.2.  Efficiencies  of  E2  and  E3  versus  Ei  when  r = 1.5 
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Table  6.3.  Relative  Efficiencies  of  E2  to  E\  and  E3  to  E\  when  i— 2.0 


Time 

Ri 

R2 

No.  of  Estimators 

1 

12.19570 

4.14622 

1000 

2 

2.67036 

1.05323 

1000 

3 

3.36746 

1.58910 

1000 

4 

2.11647 

1.30644 

1000 

5 

2.57971 

1.83293 

1000 

6 

2.75637 

2.44174 

1000 

7 

6.74682 

7.19189 

1000 

8 

21.93844 

28.44507 

999 

9 

2.64356 

4.48147 

999 

10 

0.95675 

2.65322 

996 

11 

1.20568 

4.18544 

984 

12 

1.04331 

3.78065 

950 

13 

0.82456 

4.42452 

908 

14 

1.08746 

4.60336 

823 

15 

0.72260 

6.67253 

713 

16 

0.94320 

6.09248 

557 

17 

1.04300 

27.19504 

414 

18 

0.92408 

1.28166 

231 

19 

2.68491 

0.52304 

81 

Efficiencies  of  isotonic  estimators  versus  maximum  likelihood 
estimator  when  r=2.0 


Percentile  of  the  baseline  survival  function 


Figure  6.3.  Efficiencies  of  E2  and  E3  versus  Ex  when  r = 2.0 
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Table  6.4.  Relative  Efficiencies  of  E2  to  Ex  and  E3  to  Ex  when  r=2.5 


Time 

Ri 

R2 

No.  of  Estimators 

1 

4.24194 

1.51224 

1000 

2 

2.46923 

1.17791 

1000 

3 

4.12568 

2.74376 

1000 

4 

2.02903 

1.51935 

1000 

5 

13.99279 

12.75204 

1000 

6 

20.28677 

20.33314 

1000 

7 

3.81523 

4.481471 

1000 

8 

5.13366 

7.35062 

999 

9 

2.27329 

5.34532 

999 

10 

1.42994 

6.60936 

995 

11 

1.52489 

6.82602 

980 

12 

1.27592 

5.57158 

948 

13 

1.02940 

23.24822 

911 

14 

0.84960 

4.03035 

835 

15 

0.92992 

9.11620 

735 

16 

1.10174 

12.22548 

604 

17 

1.02400 

33.20784 

465 

18 

0.94614 

1.09369 

290 

19 

1.20355 

2.40673 
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Efficiencies  of  isotonic  estimators  versus  maximum  likelihood 
estimator  when  r=2.5 


Percentile  of  the  baseline  survival  function 


Figure  6.4.  Efficiencies  of  E2  and  E3  versus  Ex  when  r = 2.5. 
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Table  6.5.  Relative  Efficiencies  of  E2  to  Ex  and  E3  to  Ex  when  i-3.0 


Time 

Ri 

R2 

No.  of  Estimators 

1 

2.46000 

1.11768 

1000 

2 

4.58195 

2.55058 

1000 

3 

4.03790 

3.91666 

1000 

4 

3.81653 

4.15003 

1000 

5 

3.00217 

3.24918 

1000 

6 

2.07541 

2.30397 

1000 

7 

8.53029 

10.95093 

1000 

8 

3.38755 

6.22937 

999 

9 

2.30834 

7.19948 

999 

10 

1.39705 

12.99846 

993 

11 

1.14255 

6.43423 

976 

12 

1.37777 

7.00777 

946 

13 

0.94358 

10.39849 

912 

14 

1.62100 

12.27069 

845 

15 

0.74918 

7.36184 

748 

16 

0.97037 

17.39330 

635 

17 

1.02220 

37.40968 

498 

18 

0.99905 

0.95536 

315 

19 

1.49304 

1.39411 

127 

Efficiencies  of  isotonic  estimators  versus  maximum  likelihood 
estimator  when  r=3.0 


Percentile  of  the  baseline  survival  function 


Figure  6.5.  Efficiencies  of  Ei  and  E3  versus  E\  when  r = 3.0. 


APPENDIX  A 
PROOF  OF  LEMMA  2.2.1 


Lemma  2.2.  t 

Conditional  on  E"=i  Dn;i  = m,  Dn;1,  • • • , Dn.<n_i  have  a uniform  distribution  over 
the  area: 

n— 1 

di>0,  *■  = 1, — 1,  (A.l) 

1=1 

under  the  null  hypothesis  that  A 0(t)  is  constant. 

Proof  of  Lemma  2.2.1 

It  is  well  known  that  Dn;i  ( i = 1,  •••,«)  are  independent  and  distributed  expo- 
nentially with  mean  A,  ie., 

fDn„(dt)  = Aexp(— Ad,)  » = 1,  •••,?». 

Therefore  E Dn.t  has  a gamma  distribution  with  parameters  n and  A,  ie., 


fZDnAd)  = lexP(~M) 

It  is  easy  to  see  that  the  conditional  distribution  of  Z)n;1,  • • • , Dn.n_x  given  E^-i  Dn  i = 
m,  is 

•^n;l ,Z?n;n — 1 1 E D„;i=m(d  1 > ' ’ ' > ^n-1 ) 

f Dn\l  (^1 > * * * ? 1 ) ^ d\  ' * * ) 

/EOn;.(m) 
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nr=i  Aexp(-Arfi) 


Aexp(-A(m  - di <4-i)) 


A nmn~1  exp(  — Am) 
(n-1)! 


(”  ~ 1)! 
mn~1. 


Lemma  2.2.2 

Let  = * = !,•••,  n-1.  Then  conditional  on  £ Dn;i  = m,  Xu  • • • , Xn_x 

have  a uniform  distribution  over  the  area: 


*i>°  * = 1»’  ”,n-l  Xi  + -.-  + x„_1  < 1. 


(A. 2) 


Proof  of  Lemma  2.2.2 

It  is  seen  that  Lemma  2.2.2  is  a consequence  of  Lemmma  2.2.1  by  a scale  change. 


APPENDIX  B 

PROOF  OF  THEOREM  2.2.2 


Theorem  2.2.2 


If  failure  time  has  an  exponential  distribution  with  parameter  A, 


k- 1 


(B.l) 


where  Vk  is  defined  in  (2.3)  and  Uj  are  independent  uniform  random  variables  on 


(0,1)  for  j = 1,  1. 

Proof  of  Theorem  2.2.2 

The  proof  of  Theorem  2.2.2  will  be  limited  to  the  case  k = n.  It  can  be  seen, 
from  the  structure  of  Vk  in  (2.3)  and  Theorem  2.2.1  that  the  proof  of  Theorem  2.2.2 
for  k < n is  the  same  as  that  for  k = n.  It  is  easy  to  see  that  by  definition  of  Vn  in 
(2.2) 


Denote  the  distribution  function  of  the  random  variable  in  the  right  hand  side  of 
(B.l)  with  k = n by  G.  To  finish  the  proof  we  shall  prove  that  conditional  on  £)„:,, 
Vn  has  the  same  distribution  G. 

The  moment  generating  function  of  the  random  variables  of  the  right  hand  side 
of  (B.l)  with  k = n is 


Ki  — {n  l)Xi  + {n  — 2)X2  + • • • + Xn-\. 


£[exp{((P,  + ■ ■ ■ + (/„_,)}]  = [exp(‘>  Y-. 


(B.2) 
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As  we  have  shown  in  Lemma  2.2.2,  conditional  on  X1  •••A'n_1  have  a 

uniform  distribution  over  the  simplex  (A. 2).  In  order  to  prove  that  conditional  on 
Vn  has  the  same  distribution  G,  we  may  assume  that  have  a 

uniform  distribution  over  the  simplex  (A. 2)  and  show  that  Vn  = (n  — l)Ah  -f  (n  — 

2)^2  + • • • + -Yn_j  has  the  same  moment  generating  function  as  (B.l).  That  is,  we 
must  show 

g[expiUI  = [^*)-1r1  (B.3) 

When  n = 2,  (B.3)  clearly  holds. 

By  an  induction  argument,  suppose  (B.3)  holds  for  n - 1.  We  will  show  that  it 
holds  for  n.  Next  we  compute 

E[exptVn\  = E[expt{(n-l)X1  + (n-2)X2  + •••  + . Y„_a)}] 

= EE  [exp  t{(n  - 1)^  + (n  - 2)X2  + • • • + X^)}^] 

= ^[expfftn-l)*!  + (n-2)JY2  + -..  + JYn_1)}|A'1  = xj] 

(1  — x1)n-2(n  — 1 )dx\ 

= l exp[t(n  - l^jj^fexptKn  - 2)X2  + • • • + X^)}^] 

(1  — Xi)n-2(n  — 1 )dxi 

= j exp[£(n  — l)xa]£J[exp(^(l  — xa)){(n  — 2)?— ^ — - 
+ • • • + i^y)}pf,](l  - *,)~J(n  - 1 )dx,. 
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It  can  be  shown  that  conditional  on  Ax  — Xx,  7 \ have  a uniform 
distribution  over  the  simplex; 

2/»  > 0,  i = 2,  • • • , n - 1,  yt-  < 1. 

i= 2 

By  the  induction  assumption,  we  have  that  conditional  on  Ax  = xi,  the  moment 
generating  function  of  (n  — 2)7*2  , -\ + [s  f 1 1^—2 

(1  — ) L t J 

Hence,  we  obtain 


-E[exptl4] 


jfe* P[i(-  - 1)»,][eXpl((11_^))*1  V~2(l  - •.)-'(»  - l)dr, 

/ - 1)*. 

0 t 


_ reXP(0  -exp(si<)  , 
1 * J 

= rexP(0  ~ lin-l 
1 i J ' 


11=1 

Xi  =0 


This  completes  the  proof. 
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